MMF Analysis

Edited By Felicia Luo

Contents

1. Introduction

2. The Data

3. The Models

4. The Results and Explanation

1. Introduction

Background

Research Questions

-What factors affect MMF yields and AUM/Flows? How? (Fund size, Fed reate, etc.)

-Which rating agencies (Moody's, Fitch, SP500) is important for predicting AUM/Flows?

-Do AUM have seasonalities? How does Fed Rate influnce portfolio AUM?Using historical portfolio AUM to forcast future AUM trends

2. The data

Redshift Connection

In [1]:
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
import sagemaker
sess = sagemaker.Session()
import boto3
import datetime
import warnings
warnings.filterwarnings('ignore')
In [1148]:
import warnings
warnings.filterwarnings('ignore')
In [7]:
#connet DB using sqlalchemy
#create_engine('postgresql://username:password@host:post/database name')
engine = create_engine('postgresql://wluo:JAFe!r5KGWRg#N$s257&@lm-edm-prod-product.czji6gyk9wtb.us-east-1.redshift.amazonaws.com:5439/edmprod')
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)

Loading the EDH Data

In [185]:
#read sql
#data_mmf = pd.read_sql_query("""select * from pdc.mmkt_fund_intel_dly""",engine)
In [273]:
#read pickle file - mmkt funf intel dly
#data_mmf.to_pickle("./mmf.pkl")
edh_mm = pd.read_pickle("./mmf.pkl")
In [214]:
#read EFFR from S3 Bucket
bucket ="lm-edm-product-prod"
s3 = boto3.client('s3')

file_name ="032005to032020_EFFR.csv"
obj = s3.get_object(Bucket=bucket, Key = file_name)
fed_rate = pd.read_csv(obj['Body'])
fed_rate = fed_rate[fed_rate.EFFR != '.']

#read CNFC from S3 Bucket
file_name_1 ="032005to032020_CNCF.csv"
obj_1 = s3.get_object(Bucket=bucket, Key = file_name_1)
cncf = pd.read_csv(obj_1['Body'])

#read fund_imoneynet from S3 Bucket
file_name_2 = "Fund_Yield_-ad_hoc.xlsx"
obj_2 = s3.get_object(Bucket=bucket, Key = file_name_2)
df_data_1 = pd.read_excel(obj_2['Body'],header=1)


#change data type
fed_rate['DATE'] = pd.to_datetime(fed_rate['DATE'])
cncf['DATE'] = pd.to_datetime(cncf['DATE'])
df_data_1['Date'] = pd.to_datetime(df_data_1['Date'], dayfirst=True, errors='coerce')
fed_rate['EFFR'] = fed_rate['EFFR'].astype('float')
In [277]:
#bucket ="lm-edm-product-prod"
#s3 = boto3.client('s3')
#file_name_2 = "Fund_Yield_-ad_hoc.xlsx"
#obj_2 = s3.get_object(Bucket=bucket, Key = file_name_2)
#fund_imoneynet = pd.read_excel(obj_2['Body'])
In [276]:
edh_mm_list = ['assets','chg1d','chg7d','gr1d','gr30d','inflow','outflow','netflow','wam','wal','one_day_percent','seven_day_percent',
              'thirty_day','div_factor','mnav','dla','exp_percent','ytdchg','ass12_31','seven_d12_31','cnav','cusip','money_fund','as_of_dt']
edh_mm = edh_mm[edh_mm_list]
edh_mm = edh_mm.loc[:,~edh_mm.columns.duplicated()]
edh_mm['as_of_dt'] = pd.to_datetime(edh_mm['as_of_dt'])
In [219]:
df_data = df_data_1

Feature description/data cleaning

In [12]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
#import pyfolio as pf
import seaborn as sns
import math
plt.style.use("bmh")
from pandas.plotting import scatter_matrix
In [217]:
print("Preview - mmkt_fund_intel_dly data")
print("===================================")
edh_mm.describe()
Preview - mmkt_fund_intel_dly data
===================================
Out[217]:
assets chg1d chg7d gr1d gr30d inflow outflow netflow wam wal one_day_percent seven_day_percent thirty_day div_factor mnav dla exp_percent ytdchg ass12_31 seven_d12_31
count 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000 7134.000000
mean 6030.946682 7.016749 39.933517 2.111994 2.127601 513.316614 -470.450685 40.330001 29.657686 77.036010 1.720147 1.718735 1.735754 0.000052 1.000179 53.667207 0.391847 504.749675 5526.197060 1.578553
std 14779.761396 300.781765 563.235191 0.450945 0.397947 1111.282483 1055.279166 478.394453 10.110618 29.089192 0.509242 0.485017 0.465286 0.000025 0.000282 27.536905 0.293057 3485.784283 13637.939689 0.770560
min 0.500000 -4580.532700 -6603.663600 0.660949 0.793693 -7.915062 -8640.664000 -3880.465600 1.000000 1.000000 0.000000 0.000694 0.020000 0.000000 0.999300 6.050000 -0.140000 -28224.502000 1.000000 0.000000
25% 133.965257 -2.425630 -5.756732 1.710000 1.764330 1.040720 -244.380980 -5.195179 23.000000 63.000000 1.381692 1.400000 1.438299 0.000039 1.000000 33.710000 0.190000 -33.656170 106.448130 1.240000
50% 567.040650 0.000000 0.009040 2.220000 2.220000 32.713554 -21.961279 0.319794 30.627750 85.000000 1.776962 1.778856 1.784003 0.000050 1.000100 46.100000 0.350000 0.376372 580.207030 1.806651
75% 3833.630450 5.961728 22.694517 2.453384 2.450000 321.713090 -0.552231 43.336706 36.779600 100.000000 2.120000 2.111118 2.107305 0.000060 1.000284 82.460000 0.470000 187.601475 3368.872000 2.212354
max 111020.164000 4684.567400 9976.612000 4.478925 4.359433 8980.613000 758.395500 6239.010700 85.862580 121.000000 3.711831 2.663398 2.650555 0.000251 1.001700 100.000000 2.460000 35152.887000 107050.010000 2.600000
In [15]:
print("Preview - fed rate data")
print("=======================")
#print(fed_rate)
fed_rate.head()
Preview - fed rate data
=======================
Out[15]:
DATE EFFR
0 2005-03-17 2.68
1 2005-03-18 2.70
2 2005-03-21 2.71
3 2005-03-22 2.72
4 2005-03-23 2.73
In [16]:
fed_rate=fed_rate.replace('-',np.nan)
fed_rate=fed_rate.fillna(method='ffill')
fed_rate=fed_rate.fillna(0)
In [17]:
print("Preview - CNCF")
print("=======================")
cncf.head()
Preview - CNCF
=======================
Out[17]:
DATE CNCF
0 2005-01-01 1505.909
1 2005-04-01 1533.232
2 2005-07-01 1510.044
3 2005-10-01 1597.406
4 2006-01-01 1584.485
In [228]:
print("Preview - iMoneyNet")
print("=======================")
df_data.head()
Preview - iMoneyNet
=======================
Out[228]:
SubCategory Current Fund Name Share Class Date Investment Categories Current Investor Categories Current Investor Categories Historical Fitch Rating Moody's Rating S&P Rating ... Incd 12b-1 Fee (mo) Incd Expense Ratio (mo) Incd Mgmt Fee (mo) Incd Other Fees (mo) Incd Shr. Svc. Fee (mo) Fund Adviser Share Class Assets Chg (mo/$mils) CUSIP 12b_1 target
0 First Tier Instit AIG Money Market Fund/Cl A * Class A 2005-01-31 Prime (All) Taxable Inst Taxable Inst - - AAAm ... 0.0 0.32 0.25 0.07 0.0 AIG Asset Management, LLC 92.8 00758M790 no yes
1 First Tier Instit AIG Money Market Fund/Cl A * Class A 2005-02-28 Prime (All) Taxable Inst Taxable Inst - - AAAm ... 0.0 0.31 0.25 0.06 0.0 AIG Asset Management, LLC 319.7 00758M790 no yes
2 First Tier Instit AIG Money Market Fund/Cl A * Class A 2005-03-31 Prime (All) Taxable Inst Taxable Inst - - AAAm ... 0.0 0.31 0.25 0.06 0.0 AIG Asset Management, LLC -486.0 00758M790 no no
3 First Tier Instit AIG Money Market Fund/Cl A * Class A 2005-04-30 Prime (All) Taxable Inst Taxable Inst - - AAAm ... 0.0 0.32 0.25 0.07 0.0 AIG Asset Management, LLC 164.1 00758M790 no yes
4 First Tier Instit AIG Money Market Fund/Cl A * Class A 2005-05-31 Prime (All) Taxable Inst Taxable Inst - - AAAm ... 0.0 0.32 0.25 0.07 0.0 AIG Asset Management, LLC 13.6 00758M790 no yes

5 rows × 34 columns

In [299]:
def check_flag(df,dt_column = 'EFFR'):
    _flag = []
    for ix, row in df.iterrows():
        if row[dt_column] <= 0.25:
            _flag.append('Low EFFR')
        else:
            _flag.append('High EFFR')
    return _flag

fed_rate['flag']=check_flag(fed_rate)
In [221]:
df_data.drop(['Unnamed: 0'],axis =1,inplace=True)
In [223]:
df_data['Date'] = pd.to_datetime(df_data['Date'])
fed_rate['Date'] = pd.to_datetime(fed_rate['DATE'])
In [224]:
#switch the string data to numbers
#df_data=df_data.iloc[:,10:29]
df_data.iloc[:,10:29]=df_data.iloc[:,10:29].replace('-',np.nan)
df_data.iloc[:,10:29]=df_data.iloc[:,10:29].fillna(method='ffill')
df_data.iloc[:,10:29]=df_data.iloc[:,10:29].fillna(0)
df_data.iloc[:,30]=df_data.iloc[:,30].replace('-',np.nan)
df_data.iloc[:,30]=df_data.iloc[:,30].fillna(method='ffill')
df_data.iloc[:,30]=df_data.iloc[:,30].fillna(0)
In [225]:
for col in list(df_data.columns):
    if col in ['Portfolio Assets (mo/$mils)','1-DSY (dly)','3-Mo (NSAY)','1-Yr (NSAY)','5-Yr (NSAY)','10-Yr (NSAY)','Incd 12b-1 Fee (mo)','Incd Expense Ratio (mo)','Incd Mgmt Fee (mo)','Incd Other Fees (mo)','Incd Shr. Svc. Fee (mo)']:
        df_data[col].fillna(method='ffill')
        pd.to_numeric(df_data[col],errors='coerce')
In [73]:
#print(df_data.dtypes)
In [226]:
#create a func that ensures the data type is float
def ensure_float(x):
    if isinstance(x,np.float):
        return x
    else :
        return 0
In [339]:
#add a column to determine if is 12b-1 or non12b-1, positive netflow or negative netflow
df_data['Incd 12b-1 Fee (mo)']=df_data['Incd 12b-1 Fee (mo)'].replace('-',np.nan)
df_data['Incd 12b-1 Fee (mo)']=df_data['Incd 12b-1 Fee (mo)'].fillna(0)
df_data[['1-Yr (NCAGR)','Share Class Assets (mo/$mils)','Share Class Assets Chg (mo/$mils)']]=df_data[['1-Yr (NCAGR)','Share Class Assets (mo/$mils)','Share Class Assets Chg (mo/$mils)']].replace('-',np.nan)
df_data[['1-Yr (NCAGR)','Share Class Assets (mo/$mils)','Share Class Assets Chg (mo/$mils)']]=df_data[['1-Yr (NCAGR)','Share Class Assets (mo/$mils)','Share Class Assets Chg (mo/$mils)']].fillna(method='ffill')

df_data['Share Class Assets Chg (mo/$mils)']=df_data['Share Class Assets Chg (mo/$mils)'].apply(ensure_float)
df_data['1-Yr (NCAGR)']=df_data['1-Yr (NCAGR)'].apply(ensure_float)
df_data['Share Class Assets (mo/$mils)']=df_data['Share Class Assets (mo/$mils)'].apply(ensure_float)
df_data['Incd 12b-1 Fee (mo)']=df_data['Incd 12b-1 Fee (mo)'].apply(ensure_float)

df_data['12b_1']=np.where(df_data['Incd 12b-1 Fee (mo)']>0,'yes','no')
In [229]:
# add lagged share class asset for cagr
df_data['sca_lagged'] = (df_data.sort_values(by=['Date'], ascending=True)
                       .groupby(['Fund Name'])['Share Class Assets (mo/$mils)'].shift(12))
In [230]:
df_data['fund_cagr'] = df_data['Share Class Assets (mo/$mils)']/df_data['sca_lagged']-1
In [231]:
df_data['assets_rank']=df_data.groupby(['SubCategory Current','Date','12b_1'])['Share Class Assets (mo/$mils)'].rank(ascending=False)
df_data['fund_size_large']=np.where(df_data['assets_rank']>10,'no','yes')
In [232]:
#Create a subset for rating analysis
df_data[['Fitch Rating',"Moody's Rating",'S&P Rating']]=df_data[['Fitch Rating',"Moody's Rating",'S&P Rating']].replace('-','No Rating')
df_data_rating=df_data[['SubCategory Current','Fitch Rating',"Moody's Rating",'S&P Rating','Share Class Assets Chg (mo/$mils)','1-Yr (NCAGR)','fund_cagr']]
In [233]:
#Describe the 3 kind of rating
describe_rating_FR_1=df_data_rating.groupby(['SubCategory Current','Fitch Rating'])['Share Class Assets Chg (mo/$mils)'].describe()
describe_rating_MR_1=df_data_rating.groupby(['SubCategory Current',"Moody's Rating"])['Share Class Assets Chg (mo/$mils)'].describe()
describe_rating_SR_1=df_data_rating.groupby(['SubCategory Current',"S&P Rating"])['Share Class Assets Chg (mo/$mils)'].describe()
#describe_rating_SC=df_data_rating.groupby("SubCategory Current")['Share Class Assets Chg (mo/$mils)'].describe()
#describe_rating_FR_1
In [1197]:
describe_rating_FR_cagr=df_data_rating.groupby(['SubCategory Current','Fitch Rating'])[['fund_cagr']].describe()
describe_rating_MR_cagr=df_data_rating.groupby(['SubCategory Current',"Moody's Rating"])['1-Yr (NCAGR)'].describe()
describe_rating_SR_cagr=df_data_rating.groupby(['SubCategory Current',"S&P Rating"])['1-Yr (NCAGR)'].describe()
#describe_rating_MR_cagr
In [1531]:
full_table = df_data
In [1532]:
full_table=full_table.replace('-',np.nan)
full_table=full_table.fillna(method='ffill')
full_table=full_table.fillna(0)
In [237]:
cor=np.corrcoef(
       full_table['EFFR'],
        full_table['1-Yr (NCAGR)'])
In [238]:
cor
Out[238]:
array([[1.        , 0.83373898],
       [0.83373898, 1.        ]])
In [239]:
sns.set(style='whitegrid', context = 'notebook')
cols = ['assets','chg1d','chg7d','netflow','div_factor']
In [61]:
#ploting histogram to test data balance and feature distribution
#edh_eda.hist('asset_usd',bins=50)
#edh_eda.hist('sharpe_36m',bins=50)
#edh_eda.hist('std_dev_36m',bins=50)
#edh_eda['fund_type_d'].value_counts().plot(kind='bar')
In [735]:
#scatter_matrix(edh_mm[cols],alpha=0.2,figsize=(6,6),diagonal='kde')
In [1537]:
full_table=pd.merge(fed_rate,full_table, left_on =['DATE'], right_on = ['Date'])

EDH MM

In [279]:
#print(edh_mm.isnull().any())
#edh_mm.describe()
In [290]:
#create a subset from edh_mm that only contains cusip and net flow
edh_mm_sub = edh_mm[['cusip','money_fund','as_of_dt','netflow']]
edh_mm_sub = edh_mm_sub.dropna(subset=['cusip','netflow'])
#edh_mm_sub.loc[edh_mm_sub['netflow']==0]
In [1540]:
#merge edh_mm with iMoneynet
full_table_mm=pd.merge(edh_mm_sub,full_table, left_on =['as_of_dt','cusip'], right_on = ['DATE','CUSIP'])

full_table_mm=full_table_mm.replace('-',np.nan)
full_table_mm=full_table_mm.fillna(method='ffill')
full_table_mm=full_table_mm.fillna(0)
In [1541]:
full_table_mm['target']=np.where(full_table_mm['netflow']>0,'yes','no')
In [328]:
#full_table_mm

3 Rating Agencies

In [321]:
from collections import Counter
In [346]:
sp_rating = Counter(fund_imoneynet_test['S&P Rating'])
labels = sp_rating.keys()
sizes = sp_rating.values()

f, axes = plt.subplots(1,2,figsize=(14,4))

sns.countplot(fund_imoneynet_test['S&P Rating'],ax= axes[0], palette = "Set1")
axes[0].set_xlabel('S&P Rating', fontsize = 14)
axes[0].set_ylabel('Count', fontsize = 14)
axes[0].yaxis.tick_left()

sns.violinplot(x = 'S&P Rating', y= '1-Yr (NCAGR)', data = fund_imoneynet_test, ax = axes[1])
axes[1].set_xlabel('S&P Rating', fontsize = 14)
axes[1].set_ylabel('1-yr NCAGR', fontsize = 14)
axes[1].yaxis.tick_right()

plt.show
Out[346]:
<function matplotlib.pyplot.show(*args, **kw)>
In [333]:
fitch_rating = Counter(fund_imoneynet_test['Fitch Rating'])
labels = fitch_rating.keys()
sizes = fitch_rating.values()

f, axes = plt.subplots(1,2,figsize=(14,4))

sns.countplot(fund_imoneynet_test['Fitch Rating'],ax= axes[0], palette = "Set1")
axes[0].set_xlabel('Fitch Rating', fontsize = 14)
axes[0].set_ylabel('Count', fontsize = 14)
axes[0].yaxis.tick_left()

sns.violinplot(x = 'Fitch Rating', y= '1-Yr (NCAGR)', data = fund_imoneynet_test, ax = axes[1])
axes[1].set_xlabel('Fitch Rating', fontsize = 14)
axes[1].set_ylabel('1-yr NCAGR', fontsize = 14)
axes[1].yaxis.tick_right()

plt.show
Out[333]:
<function matplotlib.pyplot.show(*args, **kw)>
In [340]:
moody_rating = Counter(fund_imoneynet_test['Moodys Rating'])
labels = fitch_rating.keys()
sizes = fitch_rating.values()

f, axes = plt.subplots(1,2,figsize=(14,4))

sns.countplot(fund_imoneynet_test['Moodys Rating'],ax= axes[0], palette = "Set1")
axes[0].set_xlabel('Moodys Rating', fontsize = 14)
axes[0].set_ylabel('Count', fontsize = 14)
axes[0].yaxis.tick_left()

sns.violinplot(x = 'Moodys Rating', y= '1-Yr (NCAGR)', data = fund_imoneynet_test, ax = axes[1])
axes[1].set_xlabel('Moodys Rating', fontsize = 14)
axes[1].set_ylabel('1-yr NCAGR', fontsize = 14)
axes[1].yaxis.tick_right()

plt.show
Out[340]:
<function matplotlib.pyplot.show(*args, **kw)>

3. The Models

EFFR vs. Fund NCAGR over 15 years

In [240]:
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.dates as mdates
import gc
import pylab
In [1605]:
full_table['ncagr']= full_table['1-Yr (NCAGR)']

first_tier_inst=full_table.loc[full_table['SubCategory Current']=='First Tier Instit']
first_tier_re=full_table.loc[full_table['SubCategory Current']=='First Tier Retail']

rolling_mean = first_tier_inst.ncagr.rolling(window=30).mean()
rolling_mean_re = first_tier_re.ncagr.rolling(window=30).mean()

f, ax = plt.subplots(figsize=(20,10))
ax2 = plt.twinx()

full_table.plot(
        kind='line',
        x='DATE',
        y='EFFR',
        ax=ax,
        label='ERRF',
        color='maroon'
    )

ax.format_xdata =  mdates.DateFormatter('%Y-%m-%d')
plt.xlabel("Time")
plt.ylabel("1-year NCAGR and Fed Rate")
f.autofmt_xdate()



plt.plot(first_tier_inst.DATE,
         rolling_mean,
         label ='first_tier_inst_cagr',
         color='green'

)

plt.plot(first_tier_re.DATE,
         rolling_mean_re,
         label = 'first_tier_retail_cagr',
         color='blue'

)

plt.legend(loc='upper left')
plt.show()

plt.close()
In [1599]:
govt_agen_inst=full_table.loc[full_table['SubCategory Current']=='Govt & Agencies Instit']
govt_agen_re=full_table.loc[full_table['SubCategory Current']=='Govt & Agency Retail']

rolling_mean = govt_agen_inst.ncagr.rolling(window=30).mean()
rolling_mean_re = govt_agen_re.ncagr.rolling(window=30).mean()

f, ax = plt.subplots(figsize=(20,10))
ax2 = plt.twinx()

full_table.plot(
        kind='line',
        x='DATE',
        y='EFFR',
        ax=ax,
        label='ERRF',
        color='maroon'
    )

ax.format_xdata =  mdates.DateFormatter('%Y-%m-%d')
plt.xlabel("Time")
plt.ylabel("1-year NCAGR and Fed Rate")
f.autofmt_xdate()


plt.plot(govt_agen_inst.DATE,
         rolling_mean,
         label ='govt_agen_inst_cagr',
         color='green'

)

plt.plot(govt_agen_re.DATE,
         rolling_mean_re,
         label = 'govt_agen_retail_cagr',
         color='blue'

)

plt.legend(loc='upper left')
plt.show()
In [1600]:
tf_national_inst=full_table.loc[full_table['SubCategory Current']=='T-F National Inst']
tf_national_re=full_table.loc[full_table['SubCategory Current']=='T-F National Retail']

rolling_mean = tf_national_inst.ncagr.rolling(window=30).mean()
rolling_mean_re = tf_national_re.ncagr.rolling(window=30).mean()

f, ax = plt.subplots(figsize=(20,10))
ax2 = plt.twinx()

full_table.plot(
        kind='line',
        x='DATE',
        y='EFFR',
        ax=ax,
        label='ERRF',
        color='maroon'
    )

ax.format_xdata =  mdates.DateFormatter('%Y-%m-%d')
plt.xlabel("Time")
plt.ylabel("1-year NCAGR and Fed Rate")
f.autofmt_xdate()


plt.plot(tf_national_inst.DATE,
         rolling_mean,
         label ='tf_national_inst_cagr',
         color='green'

)

plt.plot(tf_national_re.DATE,
         rolling_mean_re,
         label = 'tf_national_retail_cagr',
         color='blue'

)

plt.legend(loc='upper left')
plt.show()
In [1608]:
treas_inst=full_table.loc[full_table['SubCategory Current']=='Treasury Instit']
treas_re=full_table.loc[full_table['SubCategory Current']=='Treasury Retail']

rolling_mean = treas_inst.ncagr.rolling(window=30).mean()
rolling_mean_re =treas_re.ncagr.rolling(window=30).mean()

f, ax = plt.subplots(figsize=(20,10))
ax2 = plt.twinx()

full_table.plot(
        kind='line',
        x='DATE',
        y='EFFR',
        ax=ax,
        label='ERRF',
        color='maroon'
    )

ax.format_xdata =  mdates.DateFormatter('%Y-%m-%d')
plt.xlabel("Time")
plt.ylabel("1-year NCAGR and Fed Rate")
f.autofmt_xdate()


plt.plot(treas_inst.DATE,
         rolling_mean,
         label ='treas_inst_cagr',
         color='green'

)

plt.plot(treas_re.DATE,
         rolling_mean_re,
         label = 'treas_retail_cagr',
         color='blue'

)

plt.legend(loc='upper left')
plt.show()
In [1607]:
full_table['ncagr']= full_table['1-Yr (NCAGR)']
tf_national_inst=full_table.loc[full_table['SubCategory Current']=='T-F National Inst']
tf_national_re=full_table.loc[full_table['SubCategory Current']=='T-F National Retail']

rolling_mean = tf_national_inst.ncagr.rolling(window=15).mean()
rolling_mean_re = tf_national_re.ncagr.rolling(window=15).mean()

f, ax = plt.subplots(figsize=(20,10))
ax2 = plt.twinx()

tf_national_inst.plot(
        kind='line',
        x='DATE',
        y='EFFR',
        ax=ax,
        label='ERRF',
        color='maroon'
    )

ax.format_xdata =  mdates.DateFormatter('%Y-%m-%d')
plt.xlabel("Time")
plt.ylabel("1-year NCAGR and Fed Rate")
f.autofmt_xdate()



plt.plot(tf_national_inst.DATE,
         rolling_mean,
         label ='tf_national_inst_cagr',
         color='green'

)

plt.plot(tf_national_re.DATE,
         rolling_mean_re,
         label = 'tf_national_retail_cagr',
         color='blue'

)

plt.legend(loc='upper left')
plt.show()
In [301]:
low_effr = full_table.loc[full_table['flag'] == 'Low EFFR',:]
high_effr = full_table.loc[full_table['flag'] == 'High EFFR',:]
In [182]:
from scipy.stats import ttest_ind
from scipy import stats
from subprocess import check_output
from sklearn.preprocessing import StandardScaler
import math

Random Forest Regressions by iMoneynet Category

step 1: create subgroups by category

step 2: perform Predictive Power Score and visualize in matrix

step 3: perform random forest regression model using high correlated attributes

step 4: parameter tunning

step 5: evaluate the model

step 6: fund size t-test

First Tier

In [1549]:
ft_mm_table = full_table_mm
#ft_mm_table = ft_mm_table.set_index('SubCategory Current',inplace = True)

#
ft_mm_table=full_table_mm.loc[['First Tier Retail','First Tier Instit']]

ft_mm_table=ft_mm_table[['1-Yr (NCAGR)','30-DSY (wk/mo)', 'Portfolio Assets (mo/$mils)','netflow', '1-DSY (dly)', 'Incd Expense Ratio (mo)', 'Incd Mgmt Fee (mo)', 'Incd Other Fees (mo)', 'Incd Shr. Svc. Fee (mo)','fund_size_large','year','month','Fitch Rating', "Moody's Rating", 'S&P Rating', 
                         'flag','target']]

mm_matrix=pps.matrix(ft_mm_table)

def heatmap(df):
    ax = sns.heatmap(df, vmin=0, vmax=1, cmap="Blues", linewidths=0.5, annot=True)
    ax.set_title('First Tier-PPS matrix')
    ax.set_xlabel('feature')
    ax.set_ylabel('target')
    return ax

sns.set(font_scale=1.4)
plt.subplots(figsize=(25,20))
heatmap(mm_matrix)
Out[1549]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f65ca073828>
In [1477]:
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

from sklearn.metrics import r2_score

import statsmodels.api as sm

from sklearn.model_selection import cross_val_score, RepeatedKFold, train_test_split
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

from sklearn import preprocessing
from sklearn.preprocessing import Normalizer

pd.options.display.max_columns = None

from rfpimp import permutation_importances
In [1479]:
ft_mm_table_sub = ft_mm_table.drop(['target'],axis =1)
#creat LinearRegression dataset
x = ft_mm_table_sub.drop(["netflow"],axis =1)
y = ft_mm_table_sub['netflow']

categorical = [col for col in x.columns if x[col].dtypes == 'O']
numerical = [col for col in x.columns if x[col].dtypes != 'O']


# encode categorical variables with one-hot encoding

encoder = ce.OneHotEncoder(cols=['flag','Fitch Rating', "Moody's Rating", 'S&P Rating', 'fund_size_large'
])

x = encoder.fit_transform(x)
In [1480]:
#split dataset
np.random.seed(seed=42)
x['random']=np.random.random(size=len(x))
x_train,x_valid,y_train,y_valid = train_test_split(x,y,test_size = 0.8, random_state =101)
In [1481]:
rf = RandomForestRegressor(n_estimators = 1500,
                          n_jobs = -1,
                          oob_score = True,
                          bootstrap = True,
                          random_state = 80,
                          max_features = 'auto',
                          max_leaf_nodes = 50)
rf.fit(x_train,y_train)

print('R^2 training score: {:.2f} \n00B Score: {:.2f} \nR^2 Validation Score: {:.2f}'.format(rf.score(x_train,y_train),
                                                                                            rf.oob_score_,
                                                                                            rf.score(x_valid,y_valid)))
R^2 training score: 0.85 
00B Score: 0.03 
R^2 Validation Score: 0.21

The training score seems acceptable but the validation score indicates overfitting (low bias but high variance). Try scale/normalize the features and prevent from capturing the noise of the data.

In [1482]:
#normalize features
from sklearn.preprocessing import RobustScaler,MinMaxScaler

scaler = StandardScaler()

scaled_x = scaler.fit_transform(x)

cols = x.columns

scaled_x = pd.DataFrame(scaled_x, columns=[cols])

#scaled_x

score_lr = cross_val_score(LinearRegression(),scaled_x,y,cv=RepeatedKFold(n_repeats = 10))
score_rf = cross_val_score(RandomForestRegressor(n_estimators = 100),scaled_x,y,cv = RepeatedKFold(n_repeats = 5))
In [1483]:
print("\nRandom Forest Mean Validation Score:",np.mean(score_rf))
Random Forest Mean Validation Score: 0.6648883144487518

Mean validation score boost to 0.66 with scaled features

In [1484]:
def r2(rf,x_train,y_train):
    return r2_score(y_train,rf.predict(x_train))

perm_importance_rfpimp = permutation_importances(rf,x_train,y_train,r2)
perm_importance_rfpimp.reset_index(drop = False, inplace = True)
In [1485]:
perm_importance_rfpimp
Out[1485]:
Feature Importance
0 Portfolio Assets (mo/$mils) 0.492724
1 Incd Other Fees (mo) 0.232326
2 1-Yr (NCAGR) 0.160703
3 random 0.142432
4 Incd Shr. Svc. Fee (mo) 0.127023
5 Incd Expense Ratio (mo) 0.072468
6 month 0.069040
7 1-DSY (dly) 0.059356
8 30-DSY (wk/mo) 0.045334
9 Incd Mgmt Fee (mo) 0.028703
10 Moody's Rating_1 0.016044
11 Moody's Rating_2 0.012191
12 Fitch Rating_1 0.006949
13 Fitch Rating_2 0.005922
14 S&P Rating_1 0.004932
15 S&P Rating_2 0.003668
16 fund_size_large_2 0.003536
17 fund_size_large_1 0.002484
18 year 0.002414
19 flag_1 0.000000

Fund Size

First Tier Instit

In [1323]:
q_hi = ft_mm_table["netflow"].quantile(0.99)
q_low = ft_mm_table["netflow"].quantile(0.01)
ft_mm_table=ft_mm_table[(ft_mm_table["netflow"]<q_hi) & (ft_mm_table["netflow"]>q_low) ]
Two-sample T-Test :

H0:there is no difference between large funds and small funds in terms of netflow

H1: netflow is different with large and small funds

we want to know whether average of each group is significantly with the other group.

The main idea of t-test is to check whether the observed value is strong than the variation on the data.

In [1326]:
#t-test on Large fund vs. Samll fund
large_fund = ft_mm_table.loc[ft_mm_table['fund_size_large'] == 'yes',:] #define large fund df
small_fund = ft_mm_table.loc[ft_mm_table['fund_size_large'] == 'no',:] #define small fund df
In [1327]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[1327]:
Ttest_indResult(statistic=-1.994442553568534, pvalue=0.04723964045361748)

there is a relationship (difference is significant) between large_fund Net Flow and small_fund Net Flow since the p-value is <0.05

In [1328]:
ax = plt.subplots(figsize = (18,8))
#plot the small fund ncagr distribution
sns.distplot(small_fund['netflow'],bins=20, hist = True, label = 'small fund')

#plot the large fund ncagr distribution
sns.distplot(large_fund['netflow'],bins=20, hist = True, label = 'large fund')
plt.legend() #show legend
Out[1328]:
<matplotlib.legend.Legend at 0x7f65a41061d0>
In [1321]:
#boxplot - test assumption
ax = sns.boxplot(y="fund_size_large",x="netflow",data=ft_mm_table,orient="h")

First Tier Retail

In [520]:
first_tier_re=full_table_mm.loc[full_table_mm['SubCategory Current']=='First Tier Retail']
q_hi = first_tier_re["netflow"].quantile(0.99)
q_low = first_tier_re["netflow"].quantile(0.01)
first_tier_re=first_tier_re[(first_tier_re["netflow"]<q_hi) & (first_tier_re["netflow"]>q_low)]
In [521]:
#t-test on Large fund vs. Samll fund
large_fund = first_tier_re.loc[first_tier_re['fund_size_large'] == 'yes',:] #define large fund df
small_fund = first_tier_re.loc[first_tier_re['fund_size_large'] == 'no',:] #define small fund df
In [522]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[522]:
Ttest_indResult(statistic=4.697077775025281, pvalue=4.376776666998996e-06)
In [607]:
ax = plt.subplots(figsize = (18,8))
#plot the small fund ncagr distribution
sns.distplot(small_fund['netflow'],bins=20, hist = True, label = 'small fund')

#plot the large fund ncagr distribution
sns.distplot(large_fund['netflow'],bins=20, hist = True, label = 'large fund')
plt.legend() #show legend
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
Out[607]:
<matplotlib.legend.Legend at 0x7f65c416a6d8>
In [608]:
#boxplot - test assumption
ax = sns.boxplot(y="fund_size_large",x="netflow",data=first_tier_re,orient="h")

Government Agency

In [1486]:
ft_mm_table=full_table_mm.loc[['Govt & Agencies Instit','Govt & Agency Retail']]
ft_mm_table=ft_mm_table[['1-Yr (NCAGR)','30-DSY (wk/mo)', 'Portfolio Assets (mo/$mils)','netflow', '1-DSY (dly)', 'Incd Expense Ratio (mo)', 'Incd Mgmt Fee (mo)', 'Incd Other Fees (mo)', 'Incd Shr. Svc. Fee (mo)','fund_size_large','year','month','Fitch Rating', "Moody's Rating", 'S&P Rating', 'flag','target']]
mm_matrix=pps.matrix(ft_mm_table)

def heatmap(df):
    ax = sns.heatmap(df, vmin=0, vmax=1, cmap="Blues", linewidths=0.5, annot=True)
    ax.set_title('Govt & Agency-PPS matrix')
    ax.set_xlabel('feature')
    ax.set_ylabel('target')
    return ax

sns.set(font_scale=1.4)
plt.subplots(figsize=(25,20))
heatmap(mm_matrix)
Out[1486]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f65ca6800f0>
In [1487]:
ft_mm_table_sub = ft_mm_table.drop(['target'],axis =1)
#creat LinearRegression dataset
x = ft_mm_table_sub.drop(["netflow"],axis =1)
y = ft_mm_table_sub['netflow']

categorical = [col for col in x.columns if x[col].dtypes == 'O']
numerical = [col for col in x.columns if x[col].dtypes != 'O']


# encode categorical variables with one-hot encoding

encoder = ce.OneHotEncoder(cols=['Fitch Rating', "Moody's Rating", 'S&P Rating', 'flag','fund_size_large'
])

x = encoder.fit_transform(x)
In [1488]:
#split dataset
np.random.seed(seed=42)
x['random']=np.random.random(size=len(x))
x_train,x_valid,y_train,y_valid = train_test_split(x,y,test_size = 0.8, random_state =42)
In [1490]:
rf = RandomForestRegressor(n_estimators = 1000,
                          n_jobs = -1,
                          oob_score = True,
                          bootstrap = True,
                          random_state = 42)
rf.fit(x_train,y_train)

print('R^2 training score: {:.2f} \n00B Score: {:.2f} \nR^2 Validation Score: {:.2f}'.format(rf.score(x_train,y_train),
                                                                                            rf.oob_score_,
                                                                                            rf.score(x_valid,y_valid)))
R^2 training score: 0.95 
00B Score: 0.64 
R^2 Validation Score: 0.54
In [1491]:
#normalize target
from sklearn.preprocessing import RobustScaler,MinMaxScaler

scaler = StandardScaler()

scaled_x = scaler.fit_transform(x)

cols = x.columns

scaled_x = pd.DataFrame(scaled_x, columns=[cols])

#scaled_x

score_lr = cross_val_score(LinearRegression(),scaled_x,y,cv=RepeatedKFold(n_repeats = 10))
score_rf = cross_val_score(RandomForestRegressor(n_estimators = 100),scaled_x,y,cv = RepeatedKFold(n_repeats = 5))
In [1492]:
print("\nRandom Forest Mean Validation Score:",np.mean(score_rf))
Random Forest Mean Validation Score: 0.9084405026452643
In [1493]:
def r2(rf,x_train,y_train):
    return r2_score(y_train,rf.predict(x_train))

perm_importance_rfpimp = permutation_importances(rf,x_train,y_train,r2)
perm_importance_rfpimp.reset_index(drop = False, inplace = True)
In [1494]:
perm_importance_rfpimp
Out[1494]:
Feature Importance
0 Portfolio Assets (mo/$mils) 1.563674e+00
1 Incd Mgmt Fee (mo) 7.414704e-02
2 Incd Expense Ratio (mo) 6.855982e-02
3 month 5.698207e-02
4 random 2.120287e-02
5 1-Yr (NCAGR) 1.850291e-02
6 Incd Other Fees (mo) 1.555571e-02
7 1-DSY (dly) 9.961407e-03
8 Incd Shr. Svc. Fee (mo) 8.982094e-03
9 30-DSY (wk/mo) 8.502365e-03
10 year 4.621045e-03
11 Fitch Rating_1 8.200094e-04
12 Fitch Rating_2 5.790057e-04
13 fund_size_large_2 5.126520e-04
14 Moody's Rating_2 3.903418e-04
15 fund_size_large_1 3.703690e-04
16 S&P Rating_1 3.414579e-04
17 S&P Rating_2 2.381206e-04
18 Moody's Rating_1 2.214061e-04
19 flag_1 1.110223e-16

Govt Agencies Instit

In [609]:
subset=full_table_mm.loc[full_table_mm['SubCategory Current']=='Govt & Agencies Instit']
q_hi = subset["netflow"].quantile(0.99)
q_low = subset["netflow"].quantile(0.01)
subset=subset[(subset["netflow"]<q_hi) & (subset["netflow"]>q_low)]
Two-sample T-Test :

H0:there is no difference between large funds and small funds in terms of netflow

H1: netflow is different with large and small funds

we want to know whether average of each group is significantly with the other group.

The main idea of t-test is to check whether the observed value is strong than the variation on the data.

In [610]:
#t-test on Large fund vs. Samll fund
large_fund = subset.loc[subset['fund_size_large'] == 'yes',:] #define large fund df
small_fund = subset.loc[subset['fund_size_large'] == 'no',:] #define small fund df
In [611]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[611]:
Ttest_indResult(statistic=0.4171587146559813, pvalue=0.6769150309671422)
In [614]:
ax = plt.subplots(figsize = (18,8))
#plot the small fund ncagr distribution
sns.distplot(small_fund['netflow'],bins=20, hist = True, label = 'small fund')

#plot the large fund ncagr distribution
sns.distplot(large_fund['netflow'],bins=20, hist = True, label = 'large fund')
plt.legend() #show legend
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
Out[614]:
<matplotlib.legend.Legend at 0x7f65c3e8c470>
In [613]:
#boxplot - test assumption
ax = sns.boxplot(y="fund_size_large",x="netflow",data=first_tier_re,orient="h")

Govt & Agency Retail

In [615]:
subset=full_table_mm.loc[full_table_mm['SubCategory Current']=='Govt & Agency Retail']
q_hi = subset["netflow"].quantile(0.99)
q_low = subset["netflow"].quantile(0.01)
subset=subset[(subset["netflow"]<q_hi) & (subset["netflow"]>q_low)]
Two-sample T-Test :

H0:there is no difference between large funds and small funds in terms of netflow

H1: netflow is different with large and small funds

we want to know whether average of each group is significantly with the other group.

The main idea of t-test is to check whether the observed value is strong than the variation on the data.

In [616]:
#t-test on Large fund vs. Samll fund
large_fund = subset.loc[subset['fund_size_large'] == 'yes',:] #define large fund df
small_fund = subset.loc[subset['fund_size_large'] == 'no',:] #define small fund df
In [617]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[617]:
Ttest_indResult(statistic=0.9531331144175218, pvalue=0.34139063450109075)
In [618]:
ax = plt.subplots(figsize = (18,8))
#plot the small fund ncagr distribution
sns.distplot(small_fund['netflow'],bins=20, hist = True, label = 'small fund')

#plot the large fund ncagr distribution
sns.distplot(large_fund['netflow'],bins=20, hist = True, label = 'large fund')
plt.legend() #show legend
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
Out[618]:
<matplotlib.legend.Legend at 0x7f65c3dbe5f8>
In [619]:
#boxplot - test assumption
ax = sns.boxplot(y="fund_size_large",x="netflow",data=first_tier_re,orient="h")

Treasury & Repo

In [1425]:
ft_mm_table=full_table_mm.loc[['Treas & Repo Retail','Treasury & Repo Instit']]
ft_mm_table=ft_mm_table[['1-Yr (NCAGR)','30-DSY (wk/mo)', 'Portfolio Assets (mo/$mils)','netflow', '1-DSY (dly)', 'Incd Expense Ratio (mo)', 'Incd Mgmt Fee (mo)', 'Incd Other Fees (mo)', 'Incd Shr. Svc. Fee (mo)','fund_size_large','year','month','Fitch Rating', "Moody's Rating", 'S&P Rating', '12b_1','target']]
mm_matrix=pps.matrix(ft_mm_table)

def heatmap(df):
    ax = sns.heatmap(df, vmin=0, vmax=1, cmap="Blues", linewidths=0.5, annot=True)
    ax.set_title('Treasury & Repo -PPS matrix')
    ax.set_xlabel('feature')
    ax.set_ylabel('target')
    return ax

sns.set(font_scale=1.4)
plt.subplots(figsize=(25,20))
heatmap(mm_matrix)
Out[1425]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f65cdae1470>
In [1426]:
ft_mm_table_sub = ft_mm_table.drop(['target'],axis =1)
#creat LinearRegression dataset
x = ft_mm_table_sub.drop(["netflow"],axis =1)
y = ft_mm_table_sub['netflow']

categorical = [col for col in x.columns if x[col].dtypes == 'O']
numerical = [col for col in x.columns if x[col].dtypes != 'O']


# encode categorical variables with one-hot encoding

encoder = ce.OneHotEncoder(cols=['Fitch Rating', "Moody's Rating", 'S&P Rating', '12b_1','fund_size_large'
])

x = encoder.fit_transform(x)
In [1427]:
#split dataset
np.random.seed(seed=42)
x['random']=np.random.random(size=len(x))
x_train,x_valid,y_train,y_valid = train_test_split(x,y,test_size = 0.8, random_state =42)
In [1428]:
rf = RandomForestRegressor(n_estimators = 1000,
                          n_jobs = -1,
                          oob_score = True,
                          bootstrap = True,
                          random_state = 42)
rf.fit(x_train,y_train)

print('R^2 training score: {:.2f} \n00B Score: {:.2f} \nR^2 Validation Score: {:.2f}'.format(rf.score(x_train,y_train),
                                                                                            rf.oob_score_,
                                                                                            rf.score(x_valid,y_valid)))
R^2 training score: 0.87 
00B Score: 0.02 
R^2 Validation Score: 0.27
In [1429]:
#normalize target
from sklearn.preprocessing import RobustScaler,MinMaxScaler

scaler = StandardScaler()

scaled_x = scaler.fit_transform(x)

cols = x.columns

scaled_x = pd.DataFrame(scaled_x, columns=[cols])

#scaled_x

score_lr = cross_val_score(LinearRegression(),scaled_x,y,cv=RepeatedKFold(n_repeats = 10))
score_rf = cross_val_score(RandomForestRegressor(n_estimators = 100),scaled_x,y,cv = RepeatedKFold(n_repeats = 5))
In [1430]:
print("\nRandom Forest Mean Validation Score:",np.mean(score_rf))
Random Forest Mean Validation Score: 0.8152203784575159
In [1431]:
def r2(rf,x_train,y_train):
    return r2_score(y_train,rf.predict(x_train))

perm_importance_rfpimp = permutation_importances(rf,x_train,y_train,r2)
perm_importance_rfpimp.reset_index(drop = False, inplace = True)
In [1432]:
perm_importance_rfpimp
Out[1432]:
Feature Importance
0 Portfolio Assets (mo/$mils) 0.705041
1 random 0.187624
2 Incd Mgmt Fee (mo) 0.172219
3 month 0.079719
4 1-Yr (NCAGR) 0.076512
5 30-DSY (wk/mo) 0.068603
6 1-DSY (dly) 0.046875
7 Incd Expense Ratio (mo) 0.039440
8 Incd Other Fees (mo) 0.034383
9 Incd Shr. Svc. Fee (mo) 0.025373
10 year 0.019378
11 Moody's Rating_2 0.003789
12 Moody's Rating_1 0.003529
13 fund_size_large_2 0.001630
14 Fitch Rating_1 0.001237
15 Fitch Rating_2 0.001038
16 12b_1_1 0.000896
17 12b_1_2 0.000720
18 fund_size_large_1 0.000483
19 S&P Rating_1 0.000122
20 S&P Rating_2 0.000080

Treasury & Repo Instit

In [620]:
subset=full_table_mm.loc[full_table_mm['SubCategory Current']=='Treasury & Repo Instit']
q_hi = subset["netflow"].quantile(0.99)
q_low = subset["netflow"].quantile(0.01)
subset=subset[(subset["netflow"]<q_hi) & (subset["netflow"]>q_low)]
Two-sample T-Test :

H0:there is no difference between large funds and small funds in terms of netflow

H1: netflow is different with large and small funds

we want to know whether average of each group is significantly with the other group.

The main idea of t-test is to check whether the observed value is strong than the variation on the data.

In [621]:
#t-test on Large fund vs. Samll fund
large_fund = subset.loc[subset['fund_size_large'] == 'yes',:] #define large fund df
small_fund = subset.loc[subset['fund_size_large'] == 'no',:] #define small fund df
In [622]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[622]:
Ttest_indResult(statistic=-1.8381052757972385, pvalue=0.06722784519275317)
In [625]:
ax = plt.subplots(figsize = (18,8))
#plot the small fund ncagr distribution
sns.distplot(small_fund['netflow'],bins=20, hist = True, label = 'small fund')

#plot the large fund ncagr distribution
sns.distplot(large_fund['netflow'],bins=20, hist = True, label = 'large fund')
plt.legend() #show legend
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
Out[625]:
<matplotlib.legend.Legend at 0x7f65c3b9ef60>
In [624]:
#boxplot - test assumption
ax = sns.boxplot(y="fund_size_large",x="netflow",data=first_tier_re,orient="h")

Treasury & Repo Retail

In [630]:
subset=full_table_mm.loc[full_table_mm['SubCategory Current']=='Treas & Repo Retail']
q_hi = subset["netflow"].quantile(0.99)
q_low = subset["netflow"].quantile(0.01)
subset=subset[(subset["netflow"]<q_hi) & (subset["netflow"]>q_low)]
Two-sample T-Test :

H0:there is no difference between large funds and small funds in terms of netflow

H1: netflow is different with large and small funds

we want to know whether average of each group is significantly with the other group.

The main idea of t-test is to check whether the observed value is strong than the variation on the data.

In [631]:
#t-test on Large fund vs. Samll fund
large_fund = subset.loc[subset['fund_size_large'] == 'yes',:] #define large fund df
small_fund = subset.loc[subset['fund_size_large'] == 'no',:] #define small fund df
In [632]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[632]:
Ttest_indResult(statistic=0.8425084576454307, pvalue=0.403732466249048)
In [633]:
ax = plt.subplots(figsize = (18,8))
#plot the small fund ncagr distribution
sns.distplot(small_fund['netflow'],bins=20, hist = True, label = 'small fund')

#plot the large fund ncagr distribution
sns.distplot(large_fund['netflow'],bins=20, hist = True, label = 'large fund')
plt.legend() #show legend
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
Out[633]:
<matplotlib.legend.Legend at 0x7f65c3f67198>
In [634]:
#boxplot - test assumption
ax = sns.boxplot(y="fund_size_large",x="netflow",data=first_tier_re,orient="h")

T-F National

In [1434]:
ft_mm_table=full_table_mm.loc[['T-F National Inst','T-F National Retail']]
ft_mm_table=ft_mm_table[['1-Yr (NCAGR)','30-DSY (wk/mo)', 'Portfolio Assets (mo/$mils)','netflow', '1-DSY (dly)', 'Incd Expense Ratio (mo)', 'Incd Mgmt Fee (mo)', 'Incd Other Fees (mo)', 'Incd Shr. Svc. Fee (mo)','fund_size_large','year','month','Fitch Rating', "Moody's Rating", 'S&P Rating', '12b_1','target']]
mm_matrix=pps.matrix(ft_mm_table)

def heatmap(df):
    ax = sns.heatmap(df, vmin=0, vmax=1, cmap="Blues", linewidths=0.5, annot=True)
    ax.set_title('T-F National-PPS matrix')
    ax.set_xlabel('feature')
    ax.set_ylabel('target')
    return ax

sns.set(font_scale=1.4)
plt.subplots(figsize=(25,20))
heatmap(mm_matrix)
Out[1434]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f65c654add8>
In [1435]:
ft_mm_table_sub = ft_mm_table.drop(['target'],axis =1)
#creat LinearRegression dataset
x = ft_mm_table_sub.drop(["netflow"],axis =1)
y = ft_mm_table_sub['netflow']

categorical = [col for col in x.columns if x[col].dtypes == 'O']
numerical = [col for col in x.columns if x[col].dtypes != 'O']


# encode categorical variables with one-hot encoding

encoder = ce.OneHotEncoder(cols=['Fitch Rating', "Moody's Rating", 'S&P Rating', '12b_1','fund_size_large'
])

x = encoder.fit_transform(x)
In [1436]:
#split dataset
np.random.seed(seed=42)
x['random']=np.random.random(size=len(x))
x_train,x_valid,y_train,y_valid = train_test_split(x,y,test_size = 0.8, random_state =42)
In [1437]:
rf = RandomForestRegressor(n_estimators = 1000,
                          n_jobs = -1,
                          oob_score = True,
                          bootstrap = True,
                          random_state = 42)
rf.fit(x_train,y_train)

print('R^2 training score: {:.2f} \n00B Score: {:.2f} \nR^2 Validation Score: {:.2f}'.format(rf.score(x_train,y_train),
                                                                                            rf.oob_score_,
                                                                                            rf.score(x_valid,y_valid)))
R^2 training score: 0.83 
00B Score: -0.26 
R^2 Validation Score: -0.29
In [1438]:
#normalize target
from sklearn.preprocessing import RobustScaler,MinMaxScaler

scaler = StandardScaler()

scaled_x = scaler.fit_transform(x)

cols = x.columns

scaled_x = pd.DataFrame(scaled_x, columns=[cols])

#scaled_x

score_lr = cross_val_score(LinearRegression(),scaled_x,y,cv=RepeatedKFold(n_repeats = 10))
score_rf = cross_val_score(RandomForestRegressor(n_estimators = 100),scaled_x,y,cv = RepeatedKFold(n_repeats = 5))
In [1439]:
print("\nRandom Forest Mean Validation Score:",np.mean(score_rf))
Random Forest Mean Validation Score: 0.38064490424102987

Even after normalizing the features, the validation score is still not high. The randomness within this category is strong.

In [1440]:
def r2(rf,x_train,y_train):
    return r2_score(y_train,rf.predict(x_train))

perm_importance_rfpimp = permutation_importances(rf,x_train,y_train,r2)
perm_importance_rfpimp.reset_index(drop = False, inplace = True)
In [1441]:
perm_importance_rfpimp
Out[1441]:
Feature Importance
0 Incd Shr. Svc. Fee (mo) 0.857468
1 random 0.337162
2 Portfolio Assets (mo/$mils) 0.146587
3 30-DSY (wk/mo) 0.088156
4 Incd Expense Ratio (mo) 0.060897
5 1-Yr (NCAGR) 0.050103
6 1-DSY (dly) 0.044193
7 S&P Rating_1 0.009803
8 S&P Rating_2 0.004408
9 Incd Mgmt Fee (mo) 0.002549
10 Incd Other Fees (mo) 0.002501
11 month 0.001555
12 year 0.000939
13 12b_1_2 0.000227
14 fund_size_large_1 0.000224
15 fund_size_large_2 0.000184
16 Fitch Rating_2 0.000178
17 Fitch Rating_1 0.000103
18 Moody's Rating_2 0.000099
19 Moody's Rating_1 0.000067
20 12b_1_1 -0.000646

Based on the above result, we can also observe that the random factor is high in predicting the netflow.

T-F National Instit

In [639]:
subset=full_table_mm.loc[full_table_mm['SubCategory Current']=='T-F National Inst']
q_hi = subset["netflow"].quantile(0.99)
q_low = subset["netflow"].quantile(0.01)
subset=subset[(subset["netflow"]<q_hi) & (subset["netflow"]>q_low)]
Two-sample T-Test :

H0:there is no difference between large funds and small funds in terms of netflow

H1: netflow is different with large and small funds

we want to know whether average of each group is significantly with the other group.

The main idea of t-test is to check whether the observed value is strong than the variation on the data.

In [640]:
#t-test on Large fund vs. Samll fund
large_fund = subset.loc[subset['fund_size_large'] == 'yes',:] #define large fund df
small_fund = subset.loc[subset['fund_size_large'] == 'no',:] #define small fund df
In [641]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[641]:
Ttest_indResult(statistic=1.2103742756603435, pvalue=0.2277073698919335)
In [642]:
ax = plt.subplots(figsize = (18,8))
#plot the small fund ncagr distribution
sns.distplot(small_fund['netflow'],bins=20, hist = True, label = 'small fund')

#plot the large fund ncagr distribution
sns.distplot(large_fund['netflow'],bins=20, hist = True, label = 'large fund')
plt.legend() #show legend
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")
Out[642]:
<matplotlib.legend.Legend at 0x7f65c3b7de48>
In [643]:
#boxplot - test assumption
ax = sns.boxplot(y="fund_size_large",x="netflow",data=first_tier_re,orient="h")

T-F National Retail

In [644]:
subset=full_table_mm.loc[full_table_mm['SubCategory Current']=='T-F National Retail']
q_hi = subset["netflow"].quantile(0.99)
q_low = subset["netflow"].quantile(0.01)
subset=subset[(subset["netflow"]<q_hi) & (subset["netflow"]>q_low)]
Two-sample T-Test :

H0:there is no difference between large funds and small funds in terms of netflow

H1: netflow is different with large and small funds

we want to know whether average of each group is significantly with the other group.

The main idea of t-test is to check whether the observed value is strong than the variation on the data.

In [645]:
#t-test on Large fund vs. Samll fund
large_fund = subset.loc[subset['fund_size_large'] == 'yes',:] #define large fund df
small_fund = subset.loc[subset['fund_size_large'] == 'no',:] #define small fund df
In [646]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[646]:
Ttest_indResult(statistic=-1.936657504830735, pvalue=0.05380662089941752)

Treasury

In [1442]:
ft_mm_table=full_table_mm.loc[['Treasury Instit','Treasury Retail']]
ft_mm_table=ft_mm_table[['1-Yr (NCAGR)','30-DSY (wk/mo)', 'Portfolio Assets (mo/$mils)','netflow', '1-DSY (dly)', 'Incd Expense Ratio (mo)', 'Incd Mgmt Fee (mo)', 'Incd Other Fees (mo)', 'Incd Shr. Svc. Fee (mo)','fund_size_large','year','month','Fitch Rating', "Moody's Rating", 'S&P Rating', '12b_1','target']]
mm_matrix=pps.matrix(ft_mm_table)

def heatmap(df):
    ax = sns.heatmap(df, vmin=0, vmax=1, cmap="Blues", linewidths=0.5, annot=True)
    ax.set_title('Treasury-PPS matrix')
    ax.set_xlabel('feature')
    ax.set_ylabel('target')
    return ax

sns.set(font_scale=1.4)
plt.subplots(figsize=(25,20))
heatmap(mm_matrix)
Out[1442]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f65d101dfd0>
In [1443]:
ft_mm_table_sub = ft_mm_table.drop(['target'],axis =1)
#creat LinearRegression dataset
x = ft_mm_table_sub.drop(["netflow"],axis =1)
y = ft_mm_table_sub['netflow']

categorical = [col for col in x.columns if x[col].dtypes == 'O']
numerical = [col for col in x.columns if x[col].dtypes != 'O']


# encode categorical variables with one-hot encoding

encoder = ce.OneHotEncoder(cols=['Fitch Rating', "Moody's Rating", 'S&P Rating', '12b_1','fund_size_large'
])

x = encoder.fit_transform(x)
In [1444]:
#split dataset
np.random.seed(seed=42)
x['random']=np.random.random(size=len(x))
x_train,x_valid,y_train,y_valid = train_test_split(x,y,test_size = 0.8, random_state =42)
In [1445]:
rf = RandomForestRegressor(n_estimators = 1000,
                          n_jobs = -1,
                          oob_score = True,
                          bootstrap = True,
                          random_state = 42)
rf.fit(x_train,y_train)

print('R^2 training score: {:.2f} \n00B Score: {:.2f} \nR^2 Validation Score: {:.2f}'.format(rf.score(x_train,y_train),
                                                                                            rf.oob_score_,
                                                                                            rf.score(x_valid,y_valid)))
R^2 training score: 0.95 
00B Score: 0.60 
R^2 Validation Score: 0.19
In [1446]:
#normalize target
from sklearn.preprocessing import RobustScaler,MinMaxScaler

scaler = StandardScaler()

scaled_x = scaler.fit_transform(x)

cols = x.columns

scaled_x = pd.DataFrame(scaled_x, columns=[cols])

#scaled_x

score_lr = cross_val_score(LinearRegression(),scaled_x,y,cv=RepeatedKFold(n_repeats = 10))
score_rf = cross_val_score(RandomForestRegressor(n_estimators = 100),scaled_x,y,cv = RepeatedKFold(n_repeats = 5))
In [1447]:
print("\nRandom Forest Mean Validation Score:",np.mean(score_rf))
Random Forest Mean Validation Score: 0.8192308296914281
In [1448]:
def r2(rf,x_train,y_train):
    return r2_score(y_train,rf.predict(x_train))

perm_importance_rfpimp = permutation_importances(rf,x_train,y_train,r2)
perm_importance_rfpimp.reset_index(drop = False, inplace = True)
In [1449]:
perm_importance_rfpimp
Out[1449]:
Feature Importance
0 Portfolio Assets (mo/$mils) 1.510008
1 month 0.301523
2 Incd Other Fees (mo) 0.071388
3 1-Yr (NCAGR) 0.051029
4 random 0.022878
5 30-DSY (wk/mo) 0.022672
6 Incd Mgmt Fee (mo) 0.020898
7 year 0.014798
8 1-DSY (dly) 0.013867
9 Incd Shr. Svc. Fee (mo) 0.012213
10 Incd Expense Ratio (mo) 0.007146
11 fund_size_large_1 0.001779
12 fund_size_large_2 0.001489
13 12b_1_2 0.000370
14 12b_1_1 0.000315
15 Moody's Rating_2 0.000218
16 Moody's Rating_1 0.000183
17 S&P Rating_2 0.000177
18 Fitch Rating_1 0.000097
19 S&P Rating_1 0.000049
20 Fitch Rating_2 0.000035

Treasury Instit

In [656]:
subset=full_table_mm.loc[full_table_mm['SubCategory Current']=='Treasury Instit']
q_hi = subset["netflow"].quantile(0.99)
q_low = subset["netflow"].quantile(0.01)
subset=subset[(subset["netflow"]<q_hi) & (subset["netflow"]>q_low)]
Two-sample T-Test :

H0:there is no difference between large funds and small funds in terms of netflow

H1: netflow is different with large and small funds

we want to know whether average of each group is significantly with the other group.

The main idea of t-test is to check whether the observed value is strong than the variation on the data.

In [657]:
#t-test on Large fund vs. Samll fund
large_fund = subset.loc[subset['fund_size_large'] == 'yes',:] #define large fund df
small_fund = subset.loc[subset['fund_size_large'] == 'no',:] #define small fund df
In [658]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[658]:
Ttest_indResult(statistic=0.3661838302964125, pvalue=0.7144582860465446)

Treasury Retail

In [659]:
subset=full_table_mm.loc[full_table_mm['SubCategory Current']=='Treasury Retail']
q_hi = subset["netflow"].quantile(0.99)
q_low = subset["netflow"].quantile(0.01)
subset=subset[(subset["netflow"]<q_hi) & (subset["netflow"]>q_low)]
Two-sample T-Test :

H0:there is no difference between large funds and small funds in terms of netflow

H1: netflow is different with large and small funds

we want to know whether average of each group is significantly with the other group.

The main idea of t-test is to check whether the observed value is strong than the variation on the data.

In [660]:
#t-test on Large fund vs. Samll fund
large_fund = subset.loc[subset['fund_size_large'] == 'yes',:] #define large fund df
small_fund = subset.loc[subset['fund_size_large'] == 'no',:] #define small fund df
In [661]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[661]:
Ttest_indResult(statistic=0.19955180276941764, pvalue=0.8433564419415551)

T-F State

In [1548]:
full_table_mm['year'] = pd.DatetimeIndex(full_table_mm['DATE']).year 
full_table_mm['month'] = pd.DatetimeIndex(full_table_mm['DATE']).month
ft_mm_table = full_table_mm

#ft_mm_table = ft_mm_table.set_index('SubCategory Current',inplace = True)
ft_mm_table=full_table_mm.loc[['T-F State Inst','T-F State Retail']]
ft_mm_table=ft_mm_table[['1-Yr (NCAGR)','30-DSY (wk/mo)', 'Portfolio Assets (mo/$mils)','netflow', '1-DSY (dly)', 'Incd Expense Ratio (mo)', 'Incd Mgmt Fee (mo)', 'Incd Other Fees (mo)', 'Incd Shr. Svc. Fee (mo)','fund_size_large','year','month','Fitch Rating', "Moody's Rating", 'S&P Rating','target']]
mm_matrix=pps.matrix(ft_mm_table)

def heatmap(df):
    ax = sns.heatmap(df, vmin=0, vmax=1, cmap="Blues", linewidths=0.5, annot=True)
    ax.set_title('T-F State matrix')
    ax.set_xlabel('feature')
    ax.set_ylabel('target')
    return ax

sns.set(font_scale=1.4)
plt.subplots(figsize=(25,20))
heatmap(mm_matrix)
Out[1548]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f65ca0785c0>

Note: All T-F State category funds have no rating attributes

In [1452]:
#Drop rating attributes for T-F State category
ft_mm_table_sub = ft_mm_table.drop(['target','Fitch Rating', "Moody's Rating", 'S&P Rating'],axis =1)
#creat LinearRegression dataset
x = ft_mm_table_sub.drop(["netflow"],axis =1)
y = ft_mm_table_sub['netflow']

categorical = [col for col in x.columns if x[col].dtypes == 'O']
numerical = [col for col in x.columns if x[col].dtypes != 'O']


# encode categorical variables with one-hot encoding

encoder = ce.OneHotEncoder(cols=['12b_1','fund_size_large'
])

x = encoder.fit_transform(x)
In [1454]:
#split dataset and add a random attribute
np.random.seed(seed=42)
x['random']=np.random.random(size=len(x))
x_train,x_valid,y_train,y_valid = train_test_split(x,y,test_size = 0.8, random_state =42)
In [1455]:
rf = RandomForestRegressor(n_estimators = 1000,
                          n_jobs = -1,
                          oob_score = True,
                          bootstrap = True,
                          random_state = 42)
rf.fit(x_train,y_train)

print('R^2 training score: {:.2f} \n00B Score: {:.2f} \nR^2 Validation Score: {:.2f}'.format(rf.score(x_train,y_train),
                                                                                            rf.oob_score_,
                                                                                            rf.score(x_valid,y_valid)))
R^2 training score: 0.87 
00B Score: -0.03 
R^2 Validation Score: -0.00
In [1464]:
#normalize target
from sklearn.preprocessing import RobustScaler,MinMaxScaler

scaler = StandardScaler()

scaled_x = scaler.fit_transform(x)

cols = x.columns

scaled_x = pd.DataFrame(scaled_x, columns=[cols])

#scaled_x

score_lr = cross_val_score(LinearRegression(),scaled_x,y,cv=RepeatedKFold(n_repeats = 10))
score_rf = cross_val_score(RandomForestRegressor(n_estimators = 1000),scaled_x,y,cv = RepeatedKFold(n_repeats = 4))
In [1465]:
print("\nRandom Forest Mean Validation Score:",np.mean(score_rf))
Random Forest Mean Validation Score: -0.5819992940664627
In [1458]:
def r2(rf,x_train,y_train):
    return r2_score(y_train,rf.predict(x_train))

perm_importance_rfpimp = permutation_importances(rf,x_train,y_train,r2)
perm_importance_rfpimp.reset_index(drop = False, inplace = True)
In [1459]:
perm_importance_rfpimp
Out[1459]:
Feature Importance
0 Incd Other Fees (mo) 0.620501
1 1-Yr (NCAGR) 0.204220
2 Incd Shr. Svc. Fee (mo) 0.146277
3 random 0.115533
4 30-DSY (wk/mo) 0.069443
5 month 0.051165
6 Portfolio Assets (mo/$mils) 0.050699
7 1-DSY (dly) 0.040929
8 Incd Expense Ratio (mo) 0.026954
9 12b_1_1 0.004244
10 12b_1_2 0.004241
11 year 0.002686
12 Incd Mgmt Fee (mo) 0.001858
13 fund_size_large_1 0.001544
14 fund_size_large_2 0.000693

T-F State Inst

In [673]:
subset=full_table_mm.loc[full_table_mm['SubCategory Current']=='T-F State Retail']
q_hi = subset["netflow"].quantile(0.99)
q_low = subset["netflow"].quantile(0.01)
subset=subset[(subset["netflow"]<q_hi) & (subset["netflow"]>q_low)]
Two-sample T-Test :

H0:there is no difference between large funds and small funds in terms of netflow

H1: netflow is different with large and small funds

we want to know whether average of each group is significantly with the other group.

The main idea of t-test is to check whether the observed value is strong than the variation on the data.

In [674]:
#t-test on Large fund vs. Samll fund
large_fund = subset.loc[subset['fund_size_large'] == 'yes',:] #define large fund df
small_fund = subset.loc[subset['fund_size_large'] == 'no',:] #define small fund df
In [675]:
ttest_ind(large_fund['netflow'],small_fund['netflow'],equal_var = False)
Out[675]:
Ttest_indResult(statistic=0.5213768991865387, pvalue=0.6023295984408618)

LM Clients

In [1259]:
import seaborn as sns
sns.set(style='white')
In [842]:
file_name_4 ="MM_citi.csv"
obj_1 = s3.get_object(Bucket=bucket, Key = file_name_4)
lm_ts = pd.read_csv(obj_1['Body'])

file_name_5 ="HistoricalData.csv"
obj_1 = s3.get_object(Bucket=bucket, Key = file_name_5)
oil_p = pd.read_csv(obj_1['Body'])
In [843]:
lm_ts = lm_ts.drop(['File Paths','Table Names','Social Code','Cusip Number', 'Dealer','Company','Money Market Indicator'],axis=1)
In [844]:
oil_p.dtypes
Out[844]:
Date            object
 Close/Last    float64
 Volume          int64
 Open          float64
 High          float64
 Low           float64
dtype: object
In [845]:
#change data type
oil_p['Date'] = pd.to_datetime(oil_p['Date'])
lm_ts['Balance Date'] = pd.to_datetime(lm_ts['Balance Date'])
In [846]:
#merge oil price with lm client
merged_lm = pd.merge(oil_p, lm_ts, left_on='Date', right_on = 'Balance Date', how = 'right')
In [852]:
#Total Dollar Balance is skwed to left, perform log transformation
merged_lm['log_tdb'] = np.log(merged_lm['Total Dollar Balance'])
In [1611]:
merged_lm.head()
Out[1611]:
Date Close/Last Volume Open High Low Name Only Block Port Desc Fund Name Soc Desc Category CountOfAccount Total Dollar Balance Balance Date log_tdb
0 2020-03-31 20.48 604605.0 20.23 21.89 20.01 MORGAN STANLEY SMITH BARNEY LLC WESTERN ASSET GOVERNMENT MONEY MKT FD, LTD. WESTERN ASSET GOVERNMENT MONEY MKT FD, LTD. CL 1 BROKERS AND DEALERS OMNI / UNDISCLOSED 1.0 4.992170e+08 2020-03-31 20.028551
1 2020-03-31 20.48 604605.0 20.23 21.89 20.01 MORGAN STANLEY SMITH BARNEY LLC WESTERN ASSET GOVERNMENT RESERVES WESTERN ASSET GOVERNMENT RESERVES CL A BUSINESS CORPS INST. 2.0 4.086431e+04 2020-03-31 10.618012
2 2020-03-31 20.48 604605.0 20.23 21.89 20.01 MORGAN STANLEY SMITH BARNEY LLC WESTERN ASSET GOVERNMENT RESERVES WESTERN ASSET GOVERNMENT RESERVES CL A BROKERS AND DEALERS OMNI / UNDISCLOSED 1.0 1.309847e+08 2020-03-31 18.690591
3 2020-03-31 20.48 604605.0 20.23 21.89 20.01 MORGAN STANLEY SMITH BARNEY LLC WESTERN ASSET GOVERNMENT RESERVES WESTERN ASSET GOVERNMENT RESERVES CL A IRA RETAIL 1.0 6.031880e+03 2020-03-31 8.704814
4 2020-03-31 20.48 604605.0 20.23 21.89 20.01 MORGAN STANLEY SMITH BARNEY LLC WESTERN ASSET GOVERNMENT RESERVES WESTERN ASSET GOVERNMENT RESERVES CL A 401K PLAN RETAIL 1.0 6.730919e+04 2020-03-31 11.117052
In [854]:
merged_lm = merged_lm.dropna(subset=['Date'])
In [1552]:
sns.pairplot(merged_lm,size=2.5)
plt.show
Out[1552]:
<function matplotlib.pyplot.show(*args, **kw)>

after log transformation, the correlation between oil price and total asset balance is still insignificant

In [1620]:
# Correlation btw oil price and total asset balance group by clients
merged_lm.groupby('Name Only Block')[['Low','log_tdb']].corr()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-1620-9242f4c80445> in <module>()
      1 # Correlation btw oil price and total asset balance group by clients
----> 2 merged_lm.groupby('Name Only Block')[['Low','log_tdb']].corr()

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/base.py in __getitem__(self, key)
    263 
    264         f = getattr(self, arg, None)
--> 265         if f is not None:
    266             if callable(f):
    267                 return f(*args, **kwargs)

KeyError: "Columns not found: 'Low'"

LM Funds Single Time Series

In [1251]:
from pyramid.arima import auto_arima
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_pacf,plot_acf
from statsmodels.tsa.seasonal import seasonal_decompose
In [1578]:
WA_fund1=edh_mm.loc[edh_mm['cusip']=='52470G791']
In [1295]:
WA_fund2=df_data_1.loc[df_data_1['CUSIP']=='52470G791']
In [1297]:
WA_fund2=WA_fund2[['Date','Portfolio Assets (mo/$mils)']]
In [1579]:
WA_fund1=WA_fund1[['as_of_dt','netflow']]
In [1299]:
#WA_fund2
#WA_fund2.set_index('Date', inplace=True)
WA_fund2.Date = pd.to_datetime(WA_fund2.Date)
WA_fund2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 183 entries, 110354 to 110536
Data columns (total 2 columns):
Date                           183 non-null datetime64[ns]
Portfolio Assets (mo/$mils)    183 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 4.3 KB
In [947]:
result = seasonal_decompose(WA_fund2,freq=12)
sns.set(rc={'figure.figsize':(20,5)})
result.plot()
Out[947]:

By observing the seasonal trend, we can see that there is surely a seasonal cmponent in the data - multiplicative

In [937]:
plot_acf(WA_fund2)
Out[937]:
In [1271]:
auto_arima(WA_fund2['Portfolio Assets (mo/$mils)'],seasonal=True,m=12,suppress_warnings=True,information_criterion='aic',max_P=5,max_D=5,max_Q=5,max_p=5,max_d=5,max_q=5).summary()
Out[1271]:
Statespace Model Results
Dep. Variable: y No. Observations: 183
Model: SARIMAX(2, 1, 2)x(0, 0, 1, 12) Log Likelihood -1520.577
Date: Tue, 19 May 2020 AIC 3055.154
Time: 21:39:06 BIC 3077.582
Sample: 0 HQIC 3064.246
- 183
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
intercept 41.4059 60.986 0.679 0.497 -78.125 160.937
ar.L1 1.0015 0.208 4.818 0.000 0.594 1.409
ar.L2 -0.4076 0.198 -2.056 0.040 -0.796 -0.019
ma.L1 -1.0697 0.177 -6.032 0.000 -1.417 -0.722
ma.L2 0.6527 0.163 4.006 0.000 0.333 0.972
ma.S.L12 0.2552 0.081 3.152 0.002 0.097 0.414
sigma2 1.048e+06 7.88e+04 13.304 0.000 8.94e+05 1.2e+06
Ljung-Box (Q): 28.89 Jarque-Bera (JB): 66.01
Prob(Q): 0.90 Prob(JB): 0.00
Heteroskedasticity (H): 4.35 Skew: 0.71
Prob(H) (two-sided): 0.00 Kurtosis: 5.58


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
In [1557]:
train = WA_fund2[:100]
test = WA_fund2[100:]
In [1587]:
WA_fund2['y']=WA_fund2['Portfolio Assets (mo/$mils)']
WA_fund2['ds']=WA_fund2['Date']
In [1591]:
from fbprophet import Prophet
m=Prophet(seasonality_mode='multiplicative')
m.fit(WA_fund2)
future=m.make_future_dataframe(periods=24,freq='M')
forecast=m.predict(future)
forecast
INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Out[1591]:
ds trend yhat_lower yhat_upper trend_lower trend_upper multiplicative_terms multiplicative_terms_lower multiplicative_terms_upper yearly yearly_lower yearly_upper additive_terms additive_terms_lower additive_terms_upper yhat
0 2005-01-31 -252.330245 -3661.413947 3000.054728 -252.330245 -252.330245 0.067373 0.067373 0.067373 0.067373 0.067373 0.067373 0.0 0.0 0.0 -269.330523
1 2005-02-28 -97.889915 -3526.701933 3224.447184 -97.889915 -97.889915 0.027171 0.027171 0.027171 0.027171 0.027171 0.027171 0.0 0.0 0.0 -100.549650
2 2005-03-31 73.097595 -3128.695048 3517.687266 73.097595 73.097595 0.025013 0.025013 0.025013 0.025013 0.025013 0.025013 0.0 0.0 0.0 74.926015
3 2005-04-30 238.569378 -3228.405416 3537.860405 238.569378 238.569378 0.023120 0.023120 0.023120 0.023120 0.023120 0.023120 0.0 0.0 0.0 244.085058
4 2005-05-31 409.556887 -3106.378701 3683.524530 409.556887 409.556887 -0.001677 -0.001677 -0.001677 -0.001677 -0.001677 -0.001677 0.0 0.0 0.0 408.870245
5 2005-06-30 575.028670 -2752.276585 3831.407266 575.028670 575.028670 0.002357 0.002357 0.002357 0.002357 0.002357 0.002357 0.0 0.0 0.0 576.384258
6 2005-07-31 746.016179 -2627.409846 4039.893531 746.016179 746.016179 -0.060197 -0.060197 -0.060197 -0.060197 -0.060197 -0.060197 0.0 0.0 0.0 701.107993
7 2005-08-31 917.003688 -2420.449596 4092.660375 917.003688 917.003688 -0.065345 -0.065345 -0.065345 -0.065345 -0.065345 -0.065345 0.0 0.0 0.0 857.082249
8 2005-09-30 1082.475470 -2447.285406 4310.827279 1082.475470 1082.475470 0.028336 0.028336 0.028336 0.028336 0.028336 0.028336 0.0 0.0 0.0 1113.148977
9 2005-10-31 1253.462979 -2036.644445 4643.768283 1253.462979 1253.462979 0.002634 0.002634 0.002634 0.002634 0.002634 0.002634 0.0 0.0 0.0 1256.764007
10 2005-11-30 1418.934762 -1956.293977 5022.879784 1418.934762 1418.934762 0.049001 0.049001 0.049001 0.049001 0.049001 0.049001 0.0 0.0 0.0 1488.463483
11 2005-12-31 1589.922270 -1501.483984 5153.023980 1589.922270 1589.922270 0.096484 0.096484 0.096484 0.096484 0.096484 0.096484 0.0 0.0 0.0 1743.323839
12 2006-01-31 1760.909779 -1649.244200 5259.682905 1760.909779 1760.909779 0.048190 0.048190 0.048190 0.048190 0.048190 0.048190 0.0 0.0 0.0 1845.767934
13 2006-02-28 1915.350112 -1292.512558 5098.815725 1915.350112 1915.350112 0.051629 0.051629 0.051629 0.051629 0.051629 0.051629 0.0 0.0 0.0 2014.237171
14 2006-03-31 2086.337624 -1286.251822 5477.628308 2086.337624 2086.337624 0.027505 0.027505 0.027505 0.027505 0.027505 0.027505 0.0 0.0 0.0 2143.722677
15 2006-04-30 2251.809410 -697.229599 5558.317115 2251.809410 2251.809410 0.021946 0.021946 0.021946 0.021946 0.021946 0.021946 0.0 0.0 0.0 2301.227700
16 2006-05-31 2422.796921 -905.802937 5297.988223 2422.796921 2422.796921 -0.011090 -0.011090 -0.011090 -0.011090 -0.011090 -0.011090 0.0 0.0 0.0 2395.928682
17 2006-06-30 2588.268707 -782.899029 6022.167672 2588.268707 2588.268707 0.001272 0.001272 0.001272 0.001272 0.001272 0.001272 0.0 0.0 0.0 2591.562051
18 2006-07-31 2759.256219 -875.535414 5707.730361 2759.256219 2759.256219 -0.063273 -0.063273 -0.063273 -0.063273 -0.063273 -0.063273 0.0 0.0 0.0 2584.669788
19 2006-08-31 2930.243730 -781.403454 5851.517650 2930.243730 2930.243730 -0.060921 -0.060921 -0.060921 -0.060921 -0.060921 -0.060921 0.0 0.0 0.0 2751.731185
20 2006-09-30 3095.715516 -223.407509 6691.421380 3095.715516 3095.715516 0.034578 0.034578 0.034578 0.034578 0.034578 0.034578 0.0 0.0 0.0 3202.758814
21 2006-10-31 3266.703027 -101.091339 6835.769164 3266.703027 3266.703027 0.004381 0.004381 0.004381 0.004381 0.004381 0.004381 0.0 0.0 0.0 3281.014667
22 2006-11-30 3432.174813 107.061982 6812.932225 3432.174813 3432.174813 0.048457 0.048457 0.048457 0.048457 0.048457 0.048457 0.0 0.0 0.0 3598.488922
23 2006-12-31 3603.162324 864.514413 7378.340218 3603.162324 3603.162324 0.089553 0.089553 0.089553 0.089553 0.089553 0.089553 0.0 0.0 0.0 3925.834993
24 2007-01-31 3774.149836 415.372696 7321.474082 3774.149836 3774.149836 0.029402 0.029402 0.029402 0.029402 0.029402 0.029402 0.0 0.0 0.0 3885.116966
25 2007-02-28 3928.590169 884.235192 7319.035346 3928.590169 3928.590169 0.076310 0.076310 0.076310 0.076310 0.076310 0.076310 0.0 0.0 0.0 4228.379198
26 2007-03-31 4099.577680 741.980256 7240.631642 4099.577680 4099.577680 0.029660 0.029660 0.029660 0.029660 0.029660 0.029660 0.0 0.0 0.0 4221.172694
27 2007-04-30 4265.049465 975.680336 7686.269525 4265.049465 4265.049465 0.020485 0.020485 0.020485 0.020485 0.020485 0.020485 0.0 0.0 0.0 4352.421079
28 2007-05-31 4436.036976 982.129884 7749.800960 4436.036976 4436.036976 -0.020562 -0.020562 -0.020562 -0.020562 -0.020562 -0.020562 0.0 0.0 0.0 4344.824488
29 2007-06-30 4601.508762 1346.628826 8178.080091 4601.508762 4601.508762 0.000310 0.000310 0.000310 0.000310 0.000310 0.000310 0.0 0.0 0.0 4602.936532
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
177 2019-10-31 13735.119518 10551.762863 17023.934524 13735.119518 13735.119518 0.005959 0.005959 0.005959 0.005959 0.005959 0.005959 0.0 0.0 0.0 13816.963645
178 2019-11-30 13758.172157 10953.830099 17765.923981 13758.172157 13758.172157 0.047687 0.047687 0.047687 0.047687 0.047687 0.047687 0.0 0.0 0.0 14414.261105
179 2019-12-31 13781.993217 11690.107519 18153.890042 13781.993217 13781.993217 0.082431 0.082431 0.082431 0.082431 0.082431 0.082431 0.0 0.0 0.0 14918.054302
180 2020-01-31 13805.814277 10579.332798 17203.442881 13805.814277 13805.814277 0.011029 0.011029 0.011029 0.011029 0.011029 0.011029 0.0 0.0 0.0 13958.080862
181 2020-02-29 13828.098494 10535.508055 17382.124427 13828.098494 13828.098494 0.002962 0.002962 0.002962 0.002962 0.002962 0.002962 0.0 0.0 0.0 13869.063601
182 2020-03-31 13851.919554 10532.582887 17224.311217 13851.919554 13851.919554 0.022195 0.022195 0.022195 0.022195 0.022195 0.022195 0.0 0.0 0.0 14159.366745
183 2020-04-30 13874.972193 11067.747547 17380.465917 13874.972193 13874.972193 0.024003 0.024003 0.024003 0.024003 0.024003 0.024003 0.0 0.0 0.0 14208.018475
184 2020-05-31 13898.793253 10560.754041 17233.148854 13898.128699 13899.499191 0.007665 0.007665 0.007665 0.007665 0.007665 0.007665 0.0 0.0 0.0 14005.332107
185 2020-06-30 13921.845891 10647.454757 17400.624777 13918.300374 13925.558927 0.003560 0.003560 0.003560 0.003560 0.003560 0.003560 0.0 0.0 0.0 13971.402828
186 2020-07-31 13945.666951 9867.382950 16602.153884 13937.348263 13954.469420 -0.056922 -0.056922 -0.056922 -0.056922 -0.056922 -0.056922 0.0 0.0 0.0 13151.847849
187 2020-08-31 13969.488011 9385.419088 16185.169501 13955.283089 13983.796794 -0.069562 -0.069562 -0.069562 -0.069562 -0.069562 -0.069562 0.0 0.0 0.0 12997.746963
188 2020-09-30 13992.540650 11156.749000 17596.780512 13971.868839 14013.666586 0.022028 0.022028 0.022028 0.022028 0.022028 0.022028 0.0 0.0 0.0 14300.761988
189 2020-10-31 14016.361710 10623.528705 17473.340548 13987.865850 14045.510351 0.000721 0.000721 0.000721 0.000721 0.000721 0.000721 0.0 0.0 0.0 14026.461139
190 2020-11-30 14039.414349 11331.587627 18196.606133 14001.382767 14076.092705 0.049314 0.049314 0.049314 0.049314 0.049314 0.049314 0.0 0.0 0.0 14731.759965
191 2020-12-31 14063.235409 12122.118071 18863.955232 14014.768939 14108.090604 0.103210 0.103210 0.103210 0.103210 0.103210 0.103210 0.0 0.0 0.0 15514.700450
192 2021-01-31 14087.056469 11817.093878 18485.650800 14028.769808 14140.047483 0.067373 0.067373 0.067373 0.067373 0.067373 0.067373 0.0 0.0 0.0 15036.145527
193 2021-02-28 14108.572265 11027.567070 17616.911053 14041.663227 14170.427022 0.027171 0.027171 0.027171 0.027171 0.027171 0.027171 0.0 0.0 0.0 14491.911824
194 2021-03-31 14132.393325 11342.180465 18215.281508 14055.891041 14205.998618 0.025013 0.025013 0.025013 0.025013 0.025013 0.025013 0.0 0.0 0.0 14485.892693
195 2021-04-30 14155.445963 11170.324080 17789.716288 14068.041123 14240.057343 0.023120 0.023120 0.023120 0.023120 0.023120 0.023120 0.0 0.0 0.0 14482.717304
196 2021-05-31 14179.267023 10831.761174 17247.841360 14079.952192 14272.413593 -0.001677 -0.001677 -0.001677 -0.001677 -0.001677 -0.001677 0.0 0.0 0.0 14155.494786
197 2021-06-30 14202.319662 11006.940530 17701.332748 14090.642318 14306.979573 0.002357 0.002357 0.002357 0.002357 0.002357 0.002357 0.0 0.0 0.0 14235.800603
198 2021-07-31 14226.140722 9889.022069 16659.764367 14102.642463 14343.314391 -0.060197 -0.060197 -0.060197 -0.060197 -0.060197 -0.060197 0.0 0.0 0.0 13369.764959
199 2021-08-31 14249.961782 9937.149309 16640.462310 14111.614988 14379.697203 -0.065345 -0.065345 -0.065345 -0.065345 -0.065345 -0.065345 0.0 0.0 0.0 13318.800627
200 2021-09-30 14273.014421 11032.667426 17978.983562 14124.857835 14416.314740 0.028336 0.028336 0.028336 0.028336 0.028336 0.028336 0.0 0.0 0.0 14677.460913
201 2021-10-31 14296.835481 10865.917131 17628.795647 14137.350825 14451.730539 0.002634 0.002634 0.002634 0.002634 0.002634 0.002634 0.0 0.0 0.0 14334.486572
202 2021-11-30 14319.888119 11569.048589 18450.711695 14150.034263 14487.166776 0.049001 0.049001 0.049001 0.049001 0.049001 0.049001 0.0 0.0 0.0 15021.571904
203 2021-12-31 14343.709179 12445.336429 18953.402446 14163.981372 14525.176610 0.096484 0.096484 0.096484 0.096484 0.096484 0.096484 0.0 0.0 0.0 15727.643182
204 2022-01-31 14367.530239 11887.332003 18556.699460 14172.235099 14569.617599 0.048190 0.048190 0.048190 0.048190 0.048190 0.048190 0.0 0.0 0.0 15059.900807
205 2022-02-28 14389.046036 12023.492851 18598.415480 14175.793384 14604.764607 0.051629 0.051629 0.051629 0.051629 0.051629 0.051629 0.0 0.0 0.0 15131.933948
206 2022-03-31 14412.867096 11382.878410 18070.743472 14180.612739 14642.651376 0.027505 0.027505 0.027505 0.027505 0.027505 0.027505 0.0 0.0 0.0 14809.295331

207 rows × 16 columns

In [1590]:
figure=m.plot(forecast,xlabel='Date',ylabel='Portfolio Asset')
In [1592]:
figure=m.plot_components(forecast)

Add a regressor to time series

In [1045]:
#fed_rate
In [1044]:
#WA_fund2
In [1131]:
fed_rate_ts = fed_rate
#fed_rate_ts.set_index('DATE')
In [1132]:
WA_fund2_regressor=pd.merge(fed_rate_ts,WA_fund2, left_on ='DATE', right_on = 'Date')
In [1096]:
def effr_envior(EFFR):
    if EFFR > 0.25:
        return 1
    else:
        return 0
In [1097]:
WA_fund2_regressor.dtypes
Out[1097]:
DATE                           datetime64[ns]
EFFR                                  float64
Date_x                         datetime64[ns]
flag                                   object
Date_y                         datetime64[ns]
Portfolio Assets (mo/$mils)           float64
y                                     float64
ds                             datetime64[ns]
dtype: object
In [1098]:
WA_fund2_regressor['EFFR'].fillna(WA_fund2_regressor['EFFR'].median(), inplace = True)
In [1134]:
m=Prophet()
m.add_regressor('EFFR',standardize=False)
m.fit(WA_fund2_regressor)
INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Out[1134]:
<fbprophet.forecaster.Prophet at 0x7f662b297860>
In [1133]:
WA_fund2_regressor=WA_fund2_regressor[['y','ds','EFFR']]
In [1143]:
#future=m.make_future_dataframe(periods=24,freq='M')

future['EFFR']=WA_fund2_regressor['EFFR']
future.reset_index(drop=True)
future['EFFR'].fillna(future['EFFR'].median(), inplace = True)
forecast=m.predict(future)
forecast
Out[1143]:
ds trend yhat_lower yhat_upper trend_lower trend_upper EFFR EFFR_lower EFFR_upper additive_terms additive_terms_lower additive_terms_upper extra_regressors_additive extra_regressors_additive_lower extra_regressors_additive_upper yearly yearly_lower yearly_upper multiplicative_terms multiplicative_terms_lower multiplicative_terms_upper yhat
0 2005-03-31 7669.785524 298.505859 6458.625806 7669.785524 7669.785524 -4312.653262 -4312.653262 -4312.653262 -4373.144162 -4373.144162 -4373.144162 -4312.653262 -4312.653262 -4312.653262 -60.490900 -60.490900 -60.490900 0.0 0.0 0.0 3296.641362
1 2005-05-31 7766.828085 -137.299187 6131.803464 7766.828085 7766.828085 -4502.060331 -4502.060331 -4502.060331 -4760.906725 -4760.906725 -4760.906725 -4502.060331 -4502.060331 -4502.060331 -258.846394 -258.846394 -258.846394 0.0 0.0 0.0 3005.921360
2 2005-06-30 7814.553935 -193.873583 6058.277655 7814.553935 7814.553935 -4880.874469 -4880.874469 -4880.874469 -4699.191266 -4699.191266 -4699.191266 -4880.874469 -4880.874469 -4880.874469 181.683203 181.683203 181.683203 0.0 0.0 0.0 3115.362669
3 2005-08-31 7913.187358 -904.560504 5308.652433 7913.187358 7913.187358 -5288.828156 -5288.828156 -5288.828156 -5659.748845 -5659.748845 -5659.748845 -5288.828156 -5288.828156 -5288.828156 -370.920689 -370.920689 -370.920689 0.0 0.0 0.0 2253.438513
4 2005-09-30 7960.913208 -1050.658404 5836.647493 7960.913208 7960.913208 -5725.921392 -5725.921392 -5725.921392 -5575.226603 -5575.226603 -5575.226603 -5725.921392 -5725.921392 -5725.921392 150.694790 150.694790 150.694790 0.0 0.0 0.0 2385.686605
5 2005-10-31 8010.229920 -1468.526671 4764.140166 8010.229920 8010.229920 -5857.049363 -5857.049363 -5857.049363 -6239.360190 -6239.360190 -6239.360190 -5857.049363 -5857.049363 -5857.049363 -382.310827 -382.310827 -382.310827 0.0 0.0 0.0 1770.869729
6 2005-11-30 8057.955770 68.495839 6468.226958 8057.955770 8057.955770 -5871.619138 -5871.619138 -5871.619138 -5045.014078 -5045.014078 -5045.014078 -5871.619138 -5871.619138 -5871.619138 826.605059 826.605059 826.605059 0.0 0.0 0.0 3012.941691
7 2006-01-31 8156.589193 -1165.750967 5097.904996 8156.589193 8156.589193 -6512.689217 -6512.689217 -6512.689217 -6050.593173 -6050.593173 -6050.593173 -6512.689217 -6512.689217 -6512.689217 462.096044 462.096044 462.096044 0.0 0.0 0.0 2105.996020
8 2006-02-28 8201.133319 -1264.064108 5097.860421 8201.133319 8201.133319 -6585.538090 -6585.538090 -6585.538090 -6212.660408 -6212.660408 -6212.660408 -6585.538090 -6585.538090 -6585.538090 372.877682 372.877682 372.877682 0.0 0.0 0.0 1988.472912
9 2006-03-31 8250.450031 -2224.367269 3969.813741 8250.450031 8250.450031 -7284.887267 -7284.887267 -7284.887267 -7286.615522 -7286.615522 -7286.615522 -7284.887267 -7284.887267 -7284.887267 -1.728254 -1.728254 -1.728254 0.0 0.0 0.0 963.834509
10 2006-05-31 8347.492592 -2276.454010 3789.638755 8347.492592 8347.492592 -7357.736140 -7357.736140 -7357.736140 -7613.820319 -7613.820319 -7613.820319 -7357.736140 -7357.736140 -7357.736140 -256.084179 -256.084179 -256.084179 0.0 0.0 0.0 733.672272
11 2006-06-30 8395.218442 -1932.911988 4298.288730 8395.218442 8395.218442 -7357.736140 -7357.736140 -7357.736140 -7190.919044 -7190.919044 -7190.919044 -7357.736140 -7357.736140 -7357.736140 166.817096 166.817096 166.817096 0.0 0.0 0.0 1204.299397
12 2006-07-31 8444.535153 -2648.174966 3531.069323 8444.535153 8444.535153 -7736.550278 -7736.550278 -7736.550278 -8119.339083 -8119.339083 -8119.339083 -7736.550278 -7736.550278 -7736.550278 -382.788805 -382.788805 -382.788805 0.0 0.0 0.0 325.196070
13 2006-08-31 8493.851865 -2642.568952 3545.063216 8493.851865 8493.851865 -7736.550278 -7736.550278 -7736.550278 -8033.559553 -8033.559553 -8033.559553 -7736.550278 -7736.550278 -7736.550278 -297.009275 -297.009275 -297.009275 0.0 0.0 0.0 460.292312
14 2006-10-31 8590.894426 -2717.693615 3714.074958 8590.894426 8590.894426 -7736.550278 -7736.550278 -7736.550278 -8069.561692 -8069.561692 -8069.561692 -7736.550278 -7736.550278 -7736.550278 -333.011414 -333.011414 -333.011414 0.0 0.0 0.0 521.332734
15 2006-11-30 8638.620276 -1319.538135 4753.794327 8638.620276 8638.620276 -7736.550278 -7736.550278 -7736.550278 -6949.069868 -6949.069868 -6949.069868 -7736.550278 -7736.550278 -7736.550278 787.480410 787.480410 787.480410 0.0 0.0 0.0 1689.550409
16 2007-01-31 8737.253700 -1873.227886 4392.373117 8737.253700 8737.253700 -7765.689827 -7765.689827 -7765.689827 -7406.798490 -7406.798490 -7406.798490 -7765.689827 -7765.689827 -7765.689827 358.891337 358.891337 358.891337 0.0 0.0 0.0 1330.455210
17 2007-02-28 8781.797826 -1795.757533 4716.451963 8781.797826 8781.797826 -7882.248023 -7882.248023 -7882.248023 -7427.928723 -7427.928723 -7427.928723 -7882.248023 -7882.248023 -7882.248023 454.319301 454.319301 454.319301 0.0 0.0 0.0 1353.869103
18 2007-04-30 8878.840386 -1957.454222 4291.862619 8878.840386 8878.840386 -7707.410729 -7707.410729 -7707.410729 -7749.427287 -7749.427287 -7749.427287 -7707.410729 -7707.410729 -7707.410729 -42.016558 -42.016558 -42.016558 0.0 0.0 0.0 1129.413099
19 2007-05-31 8928.157097 -2105.220896 4130.475729 8928.157097 8928.157097 -7692.840954 -7692.840954 -7692.840954 -7945.885274 -7945.885274 -7945.885274 -7692.840954 -7692.840954 -7692.840954 -253.044319 -253.044319 -253.044319 0.0 0.0 0.0 982.271824
20 2007-07-31 9025.199658 -2195.216192 3868.484476 9025.199658 9025.199658 -7692.840954 -7692.840954 -7692.840954 -8109.571733 -8109.571733 -8109.571733 -7692.840954 -7692.840954 -7692.840954 -416.730779 -416.730779 -416.730779 0.0 0.0 0.0 915.627924
21 2007-08-31 9074.516342 -1671.510513 4919.990009 9074.516342 9074.516342 -7226.608169 -7226.608169 -7226.608169 -7449.355146 -7449.355146 -7449.355146 -7226.608169 -7226.608169 -7226.608169 -222.746977 -222.746977 -222.746977 0.0 0.0 0.0 1625.161195
22 2007-10-31 9171.558849 -929.551880 5543.052995 9171.558849 9171.558849 -6702.096286 -6702.096286 -6702.096286 -6985.811725 -6985.811725 -6985.811725 -6702.096286 -6702.096286 -6702.096286 -283.715439 -283.715439 -283.715439 0.0 0.0 0.0 2185.747124
23 2007-11-30 9219.284673 90.548042 6332.582644 9219.284673 9219.284673 -6789.514933 -6789.514933 -6789.514933 -6042.168798 -6042.168798 -6042.168798 -6789.514933 -6789.514933 -6789.514933 747.346135 747.346135 747.346135 0.0 0.0 0.0 3177.115874
24 2007-12-31 9268.601357 1591.948185 7706.189052 9268.601357 9268.601357 -4458.351008 -4458.351008 -4458.351008 -4622.400631 -4622.400631 -4622.400631 -4458.351008 -4458.351008 -4458.351008 -164.049623 -164.049623 -164.049623 0.0 0.0 0.0 4646.200726
25 2008-01-31 9317.918041 1907.609633 7891.917038 9317.918041 9317.918041 -4691.467400 -4691.467400 -4691.467400 -4435.078858 -4435.078858 -4435.078858 -4691.467400 -4691.467400 -4691.467400 256.388542 256.388542 256.388542 0.0 0.0 0.0 4882.839183
26 2008-02-29 9364.053004 2420.359765 8198.045197 9364.053004 9364.053004 -4385.502135 -4385.502135 -4385.502135 -4170.624862 -4170.624862 -4170.624862 -4385.502135 -4385.502135 -4385.502135 214.877272 214.877272 214.877272 0.0 0.0 0.0 5193.428141
27 2008-03-31 9413.369688 2447.903888 9039.953074 9413.369688 9413.369688 -3657.013408 -3657.013408 -3657.013408 -3777.685415 -3777.685415 -3777.685415 -3657.013408 -3657.013408 -3657.013408 -120.672007 -120.672007 -120.672007 0.0 0.0 0.0 5635.684273
28 2008-04-30 9461.095512 3173.421728 9381.538106 9461.095512 9461.095512 -3453.036565 -3453.036565 -3453.036565 -3320.286826 -3320.286826 -3320.286826 -3453.036565 -3453.036565 -3453.036565 132.749739 132.749739 132.749739 0.0 0.0 0.0 6140.808686
29 2008-06-30 9558.138018 3199.545876 9447.366042 9558.138018 9558.138018 -3598.734310 -3598.734310 -3598.734310 -3402.391293 -3402.391293 -3402.391293 -3598.734310 -3598.734310 -3598.734310 196.343017 196.343017 196.343017 0.0 0.0 0.0 6155.746725
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 2019-05-31 15900.899694 9115.653566 15377.373999 15900.899694 15900.899694 -3496.745888 -3496.745888 -3496.745888 -3749.790208 -3749.790208 -3749.790208 -3496.745888 -3496.745888 -3496.745888 -253.044319 -253.044319 -253.044319 0.0 0.0 0.0 12151.109487
122 2019-07-31 15997.942196 8969.413083 15231.921406 15997.942196 15997.942196 -3496.745888 -3496.745888 -3496.745888 -3913.476667 -3913.476667 -3913.476667 -3496.745888 -3496.745888 -3496.745888 -416.730779 -416.730779 -416.730779 0.0 0.0 0.0 12084.465529
123 2019-09-30 16094.984698 10303.294068 16712.602373 16094.984698 16094.984698 -2768.257162 -2768.257162 -2768.257162 -2704.060832 -2704.060832 -2704.060832 -2768.257162 -2768.257162 -2768.257162 64.196329 64.196329 64.196329 0.0 0.0 0.0 13390.923866
124 2019-10-31 16144.301379 10375.340193 16750.433744 16144.301379 16144.301379 -2302.024376 -2302.024376 -2302.024376 -2585.739816 -2585.739816 -2585.739816 -2302.024376 -2302.024376 -2302.024376 -283.715439 -283.715439 -283.715439 0.0 0.0 0.0 13558.561563
125 2019-12-31 16241.343881 10945.716726 17053.532637 16241.343881 16241.343881 -2258.315053 -2258.315053 -2258.315053 -2422.364676 -2422.364676 -2422.364676 -2258.315053 -2258.315053 -2258.315053 -164.049623 -164.049623 -164.049623 0.0 0.0 0.0 13818.979205
126 2020-01-31 16290.660562 11172.844573 17280.456538 16290.660562 16290.660562 -2316.594151 -2316.594151 -2316.594151 -2060.205609 -2060.205609 -2060.205609 -2316.594151 -2316.594151 -2316.594151 256.388542 256.388542 256.388542 0.0 0.0 0.0 14230.454953
127 2020-02-29 16336.795522 12761.502978 19005.206735 16336.795522 16336.795522 -422.523462 -422.523462 -422.523462 -207.646189 -207.646189 -207.646189 -422.523462 -422.523462 -422.523462 214.877272 214.877272 214.877272 0.0 0.0 0.0 16129.149333
128 2020-03-31 16386.112203 12753.357431 18652.410938 16386.112203 16386.112203 -422.523462 -422.523462 -422.523462 -543.195468 -543.195468 -543.195468 -422.523462 -422.523462 -422.523462 -120.672007 -120.672007 -120.672007 0.0 0.0 0.0 15842.916735
129 2020-04-30 16433.838023 13101.037675 19160.912946 16433.838022 16433.838025 -422.523462 -422.523462 -422.523462 -289.773723 -289.773723 -289.773723 -422.523462 -422.523462 -422.523462 132.749739 132.749739 132.749739 0.0 0.0 0.0 16144.064301
130 2020-05-31 16483.154705 12898.485302 18946.447244 16483.154702 16483.154708 -422.523462 -422.523462 -422.523462 -683.872894 -683.872894 -683.872894 -422.523462 -422.523462 -422.523462 -261.349432 -261.349432 -261.349432 0.0 0.0 0.0 15799.281811
131 2020-06-30 16530.880525 13065.016213 19567.412996 16530.880520 16530.880531 -422.523462 -422.523462 -422.523462 -226.180445 -226.180445 -226.180445 -422.523462 -422.523462 -422.523462 196.343017 196.343017 196.343017 0.0 0.0 0.0 16304.700080
132 2020-07-31 16580.197206 12758.741095 19104.926439 16580.197198 16580.197215 -422.523462 -422.523462 -422.523462 -733.228847 -733.228847 -733.228847 -422.523462 -422.523462 -422.523462 -310.705386 -310.705386 -310.705386 0.0 0.0 0.0 15846.968359
133 2020-08-31 16629.513887 12570.269598 18810.965850 16629.513876 16629.513899 -422.523462 -422.523462 -422.523462 -866.934118 -866.934118 -866.934118 -422.523462 -422.523462 -422.523462 -444.410656 -444.410656 -444.410656 0.0 0.0 0.0 15762.579770
134 2020-09-30 16677.239708 13684.846398 19602.868763 16677.239694 16677.239722 -422.523462 -422.523462 -422.523462 -229.287099 -229.287099 -229.287099 -422.523462 -422.523462 -422.523462 193.236363 193.236363 193.236363 0.0 0.0 0.0 16447.952609
135 2020-10-31 16726.556389 12661.733010 19061.512646 16726.556371 16726.556407 -422.523462 -422.523462 -422.523462 -854.093380 -854.093380 -854.093380 -422.523462 -422.523462 -422.523462 -431.569919 -431.569919 -431.569919 0.0 0.0 0.0 15872.463009
136 2020-11-30 16774.282210 14205.555389 20378.617265 16774.282188 16774.282232 -422.523462 -422.523462 -422.523462 442.167374 442.167374 442.167374 -422.523462 -422.523462 -422.523462 864.690835 864.690835 864.690835 0.0 0.0 0.0 17216.449583
137 2020-12-31 16823.598891 13060.686295 19469.423247 16823.598865 16823.598917 -422.523462 -422.523462 -422.523462 -667.467017 -667.467017 -667.467017 -422.523462 -422.523462 -422.523462 -244.943556 -244.943556 -244.943556 0.0 0.0 0.0 16156.131874
138 2021-01-31 16872.915572 14052.289875 20060.517546 16872.915542 16872.915603 -422.523462 -422.523462 -422.523462 143.380013 143.380013 143.380013 -422.523462 -422.523462 -422.523462 565.903474 565.903474 565.903474 0.0 0.0 0.0 17016.295585
139 2021-02-28 16917.459671 13775.883652 19696.290562 16917.459638 16917.459706 -422.523462 -422.523462 -422.523462 -129.489658 -129.489658 -129.489658 -422.523462 -422.523462 -422.523462 293.033803 293.033803 293.033803 0.0 0.0 0.0 16787.970013
140 2021-03-31 16966.776352 13553.422295 19780.277943 16966.776315 16966.776392 -422.523462 -422.523462 -422.523462 -483.014361 -483.014361 -483.014361 -422.523462 -422.523462 -422.523462 -60.490900 -60.490900 -60.490900 0.0 0.0 0.0 16483.761991
141 2021-04-30 17014.502173 13305.339731 19856.360107 17014.502131 17014.502217 -422.523462 -422.523462 -422.523462 -346.831655 -346.831655 -346.831655 -422.523462 -422.523462 -422.523462 75.691807 75.691807 75.691807 0.0 0.0 0.0 16667.670518
142 2021-05-31 17063.818854 13130.635757 19543.645687 17063.818809 17063.818903 -422.523462 -422.523462 -422.523462 -681.369855 -681.369855 -681.369855 -422.523462 -422.523462 -422.523462 -258.846394 -258.846394 -258.846394 0.0 0.0 0.0 16382.448999
143 2021-06-30 17111.544675 13836.273521 20083.017911 17111.544625 17111.544729 -422.523462 -422.523462 -422.523462 -240.840258 -240.840258 -240.840258 -422.523462 -422.523462 -422.523462 181.683203 181.683203 181.683203 0.0 0.0 0.0 16870.704416
144 2021-07-31 17160.861356 13037.584381 19574.384336 17160.861300 17160.861418 -422.523462 -422.523462 -422.523462 -769.960966 -769.960966 -769.960966 -422.523462 -422.523462 -422.523462 -347.437505 -347.437505 -347.437505 0.0 0.0 0.0 16390.900389
145 2021-08-31 17210.178037 13267.622461 19488.375451 17210.177976 17210.178107 -422.523462 -422.523462 -422.523462 -793.444150 -793.444150 -793.444150 -422.523462 -422.523462 -422.523462 -370.920689 -370.920689 -370.920689 0.0 0.0 0.0 16416.733887
146 2021-09-30 17257.903858 13881.555329 20405.199550 17257.903792 17257.903933 -422.523462 -422.523462 -422.523462 -271.828672 -271.828672 -271.828672 -422.523462 -422.523462 -422.523462 150.694790 150.694790 150.694790 0.0 0.0 0.0 16986.075186
147 2021-10-31 17307.220539 13272.415808 19458.118570 17307.220467 17307.220620 -422.523462 -422.523462 -422.523462 -804.834289 -804.834289 -804.834289 -422.523462 -422.523462 -422.523462 -382.310827 -382.310827 -382.310827 0.0 0.0 0.0 16502.386250
148 2021-11-30 17354.946359 14624.942328 20943.205368 17354.946281 17354.946446 -422.523462 -422.523462 -422.523462 404.081598 404.081598 404.081598 -422.523462 -422.523462 -422.523462 826.605059 826.605059 826.605059 0.0 0.0 0.0 17759.027957
149 2021-12-31 17404.263040 13625.888327 19831.310931 17404.262957 17404.263136 -422.523462 -422.523462 -422.523462 -640.006956 -640.006956 -640.006956 -422.523462 -422.523462 -422.523462 -217.483495 -217.483495 -217.483495 0.0 0.0 0.0 16764.256084
150 2022-01-31 17453.579722 14346.298652 20395.492702 17453.579632 17453.579823 -422.523462 -422.523462 -422.523462 39.572583 39.572583 39.572583 -422.523462 -422.523462 -422.523462 462.096044 462.096044 462.096044 0.0 0.0 0.0 17493.152304

151 rows × 22 columns

In [1144]:
figure=m.plot(forecast,xlabel='Date',ylabel='Portfolio Asset')
In [1145]:
figure=m.plot_components(forecast)
In [ ]: